MersV1, PubMed, Corpus, bibRecord, 001F12

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Identifieur interne : 001F12 ( PubMed/Corpus ); précédent : 001F11; suivant : 001F13

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Auteurs : Guillaume Marçais ; Carl Kingsford

Source :

Bioinformatics (Oxford, England) [ 1367-4811 ] ; 2011.

RBID : pubmed:21217122

English descriptors

KwdEn :
- Algorithms, Animals, Base Sequence, Computational Biology (methods), Genome, Humans, Sequence Alignment, Sequence Analysis, DNA (methods), Software.
MESH :
- methods : Computational Biology, Sequence Analysis, DNA.
- Algorithms, Animals, Base Sequence, Genome, Humans, Sequence Alignment, Software.

Abstract

Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.

DOI: 10.1093/bioinformatics/btr011
PubMed: 21217122

Links to Exploration step

pubmed:21217122

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.</title>
<author><name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation><nlm:affiliation>Department of Computer Science, University of Maryland, College Park, MD 20742, USA. gmarcais@umd.edu</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2011">2011</date>
<idno type="RBID">pubmed:21217122</idno>
<idno type="pmid">21217122</idno>
<idno type="doi">10.1093/bioinformatics/btr011</idno>
<idno type="wicri:Area/PubMed/Corpus">001F12</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001F12</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.</title>
<author><name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation><nlm:affiliation>Department of Computer Science, University of Maryland, College Park, MD 20742, USA. gmarcais@umd.edu</nlm:affiliation>
</affiliation>
</author>
<author><name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
</author>
</analytic>
<series><title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint><date when="2011" type="published">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Animals</term>
<term>Base Sequence</term>
<term>Computational Biology (methods)</term>
<term>Genome</term>
<term>Humans</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Computational Biology</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Animals</term>
<term>Base Sequence</term>
<term>Genome</term>
<term>Humans</term>
<term>Sequence Alignment</term>
<term>Software</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM"><PMID Version="1">21217122</PMID>
<DateCompleted><Year>2011</Year>
<Month>05</Month>
<Day>31</Day>
</DateCompleted>
<DateRevised><Year>2018</Year>
<Month>12</Month>
<Day>01</Day>
</DateRevised>
<Article PubModel="Print-Electronic"><Journal><ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>27</Volume>
<Issue>6</Issue>
<PubDate><Year>2011</Year>
<Month>Mar</Month>
<Day>15</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.</ArticleTitle>
<Pagination><MedlinePgn>764-70</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/btr011</ELocationID>
<Abstract><AbstractText Label="MOTIVATION" NlmCategory="BACKGROUND">Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution.</AbstractText>
<AbstractText Label="AVAILABILITY" NlmCategory="BACKGROUND">The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http://www.cbcb.umd.edu/software/jellyfish.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Marçais</LastName>
<ForeName>Guillaume</ForeName>
<Initials>G</Initials>
<AffiliationInfo><Affiliation>Department of Computer Science, University of Maryland, College Park, MD 20742, USA. gmarcais@umd.edu</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Kingsford</LastName>
<ForeName>Carl</ForeName>
<Initials>C</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y"><Grant><GrantID>R21 AI085376</GrantID>
<Acronym>AI</Acronym>
<Agency>NIAID NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant><GrantID>1R01HG0294501</GrantID>
<Acronym>HG</Acronym>
<Agency>NHGRI NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant><GrantID>1R21AI085376</GrantID>
<Acronym>AI</Acronym>
<Agency>NIAID NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2011</Year>
<Month>01</Month>
<Day>07</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList><MeshHeading><DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D001483" MajorTopicYN="N">Base Sequence</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016678" MajorTopicYN="N">Genome</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016415" MajorTopicYN="N">Sequence Alignment</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D012984" MajorTopicYN="Y">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="entrez"><Year>2011</Year>
<Month>1</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2011</Year>
<Month>1</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2011</Year>
<Month>6</Month>
<Day>1</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">21217122</ArticleId>
<ArticleId IdType="pii">btr011</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btr011</ArticleId>
<ArticleId IdType="pmc">PMC3051319</ArticleId>
</ArticleIdList>
<ReferenceList><Reference><Citation>Nucleic Acids Res. 2004;32(5):1792-7</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15034147</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2005 Mar 1;21(5):582-8</Citation>
<ArticleIdList><ArticleId IdType="pubmed">15374857</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2008 Dec 15;24(24):2818-24</Citation>
<ArticleIdList><ArticleId IdType="pubmed">18952627</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>BMC Genomics. 2008;9:517</Citation>
<ArticleIdList><ArticleId IdType="pubmed">18976482</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Dec;78(6 Pt 1):061912</Citation>
<ArticleIdList><ArticleId IdType="pubmed">19256873</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Nature. 2010 Jan 21;463(7279):311-7</Citation>
<ArticleIdList><ArticleId IdType="pubmed">20010809</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2003 Oct;13(10):2306-15</Citation>
<ArticleIdList><ArticleId IdType="pubmed">12975312</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>PLoS Biol. 2010;8(9). pii: e1000475. doi: 10.1371/journal.pbio.1000475</Citation>
<ArticleIdList><ArticleId IdType="pubmed">20838655</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Biol. 2010;11(11):R116</Citation>
<ArticleIdList><ArticleId IdType="pubmed">21114842</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Science. 2000 Mar 24;287(5461):2196-204</Citation>
<ArticleIdList><ArticleId IdType="pubmed">10731133</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2003 Jan;13(1):91-6</Citation>
<ArticleIdList><ArticleId IdType="pubmed">12529310</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Bioinformatics. 2003 Feb 12;19(3):319-26</Citation>
<ArticleIdList><ArticleId IdType="pubmed">12584116</ArticleId>
</ArticleIdList>
</Reference>
<Reference><Citation>Genome Res. 2010 Sep;20(9):1165-73</Citation>
<ArticleIdList><ArticleId IdType="pubmed">20508146</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001F12 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001F12 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:21217122
   |texte=   A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:21217122" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Source :

English descriptors

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki