Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Asymptotically optimal minimizers schemes.

Identifieur interne : 000A07 ( PubMed/Checkpoint ); précédent : 000A06; suivant : 000A08

Asymptotically optimal minimizers schemes.

Auteurs : Guillaume Marçais [États-Unis] ; Dan Deblasio [États-Unis] ; Carl Kingsford [États-Unis]

Source :

RBID : pubmed:29949995

Descripteurs français

English descriptors

Abstract

The minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k-mers as possible (i.e. having a low density) is beneficial. The density is highly dependent on the choice of the order on the k-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density and thereby making existing and future bioinformatics tools even more efficient.

DOI: 10.1093/bioinformatics/bty258
PubMed: 29949995


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:29949995

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Asymptotically optimal minimizers schemes.</title>
<author>
<name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName>
<region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author>
<name sortKey="Deblasio, Dan" sort="Deblasio, Dan" uniqKey="Deblasio D" first="Dan" last="Deblasio">Dan Deblasio</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName>
<region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName>
<region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:29949995</idno>
<idno type="pmid">29949995</idno>
<idno type="doi">10.1093/bioinformatics/bty258</idno>
<idno type="wicri:Area/PubMed/Corpus">000854</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000854</idno>
<idno type="wicri:Area/PubMed/Curation">000854</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000854</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000A07</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000A07</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Asymptotically optimal minimizers schemes.</title>
<author>
<name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName>
<region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author>
<name sortKey="Deblasio, Dan" sort="Deblasio, Dan" uniqKey="Deblasio D" first="Dan" last="Deblasio">Dan Deblasio</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName>
<region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
<placeName>
<region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology (methods)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Biologie informatique ()</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Biologie informatique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k-mers as possible (i.e. having a low density) is beneficial. The density is highly dependent on the choice of the order on the k-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density and thereby making existing and future bioinformatics tools even more efficient.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM">
<PMID Version="1">29949995</PMID>
<DateCompleted>
<Year>2019</Year>
<Month>08</Month>
<Day>27</Day>
</DateCompleted>
<DateRevised>
<Year>2019</Year>
<Month>08</Month>
<Day>27</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>34</Volume>
<Issue>13</Issue>
<PubDate>
<Year>2018</Year>
<Month>07</Month>
<Day>01</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>Asymptotically optimal minimizers schemes.</ArticleTitle>
<Pagination>
<MedlinePgn>i13-i22</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/bty258</ELocationID>
<Abstract>
<AbstractText Label="Motivation">The minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k-mers as possible (i.e. having a low density) is beneficial. The density is highly dependent on the choice of the order on the k-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density and thereby making existing and future bioinformatics tools even more efficient.</AbstractText>
<AbstractText Label="Results">From the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare k-mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the three type of schemes.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Marçais</LastName>
<ForeName>Guillaume</ForeName>
<Initials>G</Initials>
<AffiliationInfo>
<Affiliation>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>DeBlasio</LastName>
<ForeName>Dan</ForeName>
<Initials>D</Initials>
<AffiliationInfo>
<Affiliation>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Kingsford</LastName>
<ForeName>Carl</ForeName>
<Initials>C</Initials>
<AffiliationInfo>
<Affiliation>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>R01 GM122935</GrantID>
<Acronym>GM</Acronym>
<Agency>NIGMS NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>R01 HG007104</GrantID>
<Acronym>HG</Acronym>
<Agency>NHGRI NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2018</Year>
<Month>6</Month>
<Day>29</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2018</Year>
<Month>6</Month>
<Day>29</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2019</Year>
<Month>8</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">29949995</ArticleId>
<ArticleId IdType="pii">5045769</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/bty258</ArticleId>
<ArticleId IdType="pmc">PMC6037127</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>J Comput Biol. 2004;11(4):734-52</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15579242</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Jul 15;32(14):2103-10</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27153593</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2004 Dec 12;20(18):3363-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15256412</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Comput Biol. 2017 Oct 2;13(10):e1005777</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28968408</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2016 Jun 20;17(1):132</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27323842</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2015 Apr 17;10(4):e0121453</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25884504</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2015 May 15;31(10):1569-76</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25609798</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2017 Jul 15;33(14):i110-i117</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28881970</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2012 Apr 19;13 Suppl 6:S1</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22537038</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2014 Mar 03;15(3):R46</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24580807</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Pennsylvanie</li>
</region>
<settlement>
<li>Pittsburgh</li>
</settlement>
<orgName>
<li>Université Carnegie-Mellon</li>
</orgName>
</list>
<tree>
<country name="États-Unis">
<region name="Pennsylvanie">
<name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
</region>
<name sortKey="Deblasio, Dan" sort="Deblasio, Dan" uniqKey="Deblasio D" first="Dan" last="Deblasio">Dan Deblasio</name>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A07 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd -nk 000A07 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Checkpoint
   |type=    RBID
   |clé=     pubmed:29949995
   |texte=   Asymptotically optimal minimizers schemes.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/RBID.i   -Sk "pubmed:29949995" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021