Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Asymptotically optimal minimizers schemes

Identifieur interne : 000B22 ( Pmc/Curation ); précédent : 000B21; suivant : 000B23

Asymptotically optimal minimizers schemes

Auteurs : Guillaume Marçais [États-Unis] ; Dan Deblasio [États-Unis] ; Carl Kingsford [États-Unis]

Source :

RBID : PMC:6037127

Abstract

AbstractMotivation

The minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k-mers as possible (i.e. having a low density) is beneficial. The density is highly dependent on the choice of the order on the k-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density and thereby making existing and future bioinformatics tools even more efficient.

Results

From the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare k-mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the three type of schemes.


Url:
DOI: 10.1093/bioinformatics/bty258
PubMed: 29949995
PubMed Central: 6037127

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:6037127

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Asymptotically optimal minimizers schemes</title>
<author>
<name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation wicri:level="1">
<nlm:aff id="bty258-aff1">Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Deblasio, Dan" sort="Deblasio, Dan" uniqKey="Deblasio D" first="Dan" last="Deblasio">Dan Deblasio</name>
<affiliation wicri:level="1">
<nlm:aff id="bty258-aff1">Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation wicri:level="1">
<nlm:aff id="bty258-aff1">Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">29949995</idno>
<idno type="pmc">6037127</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6037127</idno>
<idno type="RBID">PMC:6037127</idno>
<idno type="doi">10.1093/bioinformatics/bty258</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">000B22</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B22</idno>
<idno type="wicri:Area/Pmc/Curation">000B22</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000B22</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Asymptotically optimal minimizers schemes</title>
<author>
<name sortKey="Marcais, Guillaume" sort="Marcais, Guillaume" uniqKey="Marcais G" first="Guillaume" last="Marçais">Guillaume Marçais</name>
<affiliation wicri:level="1">
<nlm:aff id="bty258-aff1">Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Deblasio, Dan" sort="Deblasio, Dan" uniqKey="Deblasio D" first="Dan" last="Deblasio">Dan Deblasio</name>
<affiliation wicri:level="1">
<nlm:aff id="bty258-aff1">Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation wicri:level="1">
<nlm:aff id="bty258-aff1">Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<sec id="s1">
<title>Motivation</title>
<p>The minimizers technique is a method to sample
<italic>k</italic>
-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few
<italic>k</italic>
-mers as possible (i.e. having a low
<italic>density</italic>
) is beneficial. The density is highly dependent on the choice of the order on the
<italic>k</italic>
-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density and thereby making existing and future bioinformatics tools even more efficient.</p>
</sec>
<sec id="s2">
<title>Results</title>
<p>From the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare
<italic>k</italic>
-mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the three type of schemes.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="De Bruijn, N G" uniqKey="De Bruijn N">N.G. de Bruijn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S. Deorowicz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grabowski, S" uniqKey="Grabowski S">S. Grabowski</name>
</author>
<author>
<name sortKey="Raniszewski, M" uniqKey="Raniszewski M">M. Raniszewski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kawulok, J" uniqKey="Kawulok J">J. Kawulok</name>
</author>
<author>
<name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S. Deorowicz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lempel, A" uniqKey="Lempel A">A. Lempel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y. Li</name>
</author>
<author>
<name sortKey="Xifengyan, X" uniqKey="Xifengyan X">X. XifengYan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lichiardopol, N" uniqKey="Lichiardopol N">N. Lichiardopol</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G. Marçais</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maurer, U M" uniqKey="Maurer U">U.M. Maurer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mykkeltveit, J" uniqKey="Mykkeltveit J">J. Mykkeltveit</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ondov, B D" uniqKey="Ondov B">B.D. Ondov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Orenstein, Y" uniqKey="Orenstein Y">Y. Orenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paindavoine, M" uniqKey="Paindavoine M">M. Paindavoine</name>
</author>
<author>
<name sortKey="Vialla, B" uniqKey="Vialla B">B. Vialla</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roberts, M" uniqKey="Roberts M">M. Roberts</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roberts, M" uniqKey="Roberts M">M. Roberts</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schleimer, S" uniqKey="Schleimer S">S. Schleimer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wood, D E" uniqKey="Wood D">D.E. Wood</name>
</author>
<author>
<name sortKey="Salzberg, S L" uniqKey="Salzberg S">S.L. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ye, C" uniqKey="Ye C">C. Ye</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">Bioinformatics</journal-id>
<journal-id journal-id-type="publisher-id">bioinformatics</journal-id>
<journal-title-group>
<journal-title>Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="ppub">1367-4803</issn>
<issn pub-type="epub">1367-4811</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">29949995</article-id>
<article-id pub-id-type="pmc">6037127</article-id>
<article-id pub-id-type="doi">10.1093/bioinformatics/bty258</article-id>
<article-id pub-id-type="publisher-id">bty258</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Ismb 2018–Intelligent Systems for Molecular Biology Proceedings</subject>
<subj-group subj-group-type="category-toc-heading">
<subject>Bioinformatics of Microbes and Microbiomes</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Asymptotically optimal minimizers schemes</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Marçais</surname>
<given-names>Guillaume</given-names>
</name>
<xref ref-type="aff" rid="bty258-aff1"></xref>
<xref ref-type="corresp" rid="bty258-cor1"></xref>
<pmc-comment>gmarcais@cs.cmu.edu</pmc-comment>
</contrib>
<contrib contrib-type="author">
<name>
<surname>DeBlasio</surname>
<given-names>Dan</given-names>
</name>
<xref ref-type="aff" rid="bty258-aff1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kingsford</surname>
<given-names>Carl</given-names>
</name>
<xref ref-type="aff" rid="bty258-aff1"></xref>
<xref ref-type="corresp" rid="bty258-cor1"></xref>
<pmc-comment>ckingsf@cs.cmu.edu</pmc-comment>
</contrib>
</contrib-group>
<aff id="bty258-aff1">Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA</aff>
<author-notes>
<corresp id="bty258-cor1">To whom correspondence should be addressed.
<email>gmarcais@cs.cmu.edu</email>
or
<email>ckingsf@cs.cmu.edu</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<day>01</day>
<month>7</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="epub" iso-8601-date="2018-06-27">
<day>27</day>
<month>6</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>27</day>
<month>6</month>
<year>2018</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>34</volume>
<issue>13</issue>
<issue-title>ISMB 2018 Proceedings July 6 to July 10, 2018, Chicago, IL, United States</issue-title>
<fpage>i13</fpage>
<lpage>i22</lpage>
<permissions>
<copyright-statement>© The Author(s) 2018. Published by Oxford University Press.</copyright-statement>
<copyright-year>2018</copyright-year>
<license license-type="cc-by-nc" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">http://creativecommons.org/licenses/by-nc/4.0/</ext-link>
), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com</license-p>
</license>
</permissions>
<self-uri xlink:href="bty258.pdf"></self-uri>
<abstract>
<title>Abstract</title>
<sec id="s1">
<title>Motivation</title>
<p>The minimizers technique is a method to sample
<italic>k</italic>
-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few
<italic>k</italic>
-mers as possible (i.e. having a low
<italic>density</italic>
) is beneficial. The density is highly dependent on the choice of the order on the
<italic>k</italic>
-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density and thereby making existing and future bioinformatics tools even more efficient.</p>
</sec>
<sec id="s2">
<title>Results</title>
<p>From the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare
<italic>k</italic>
-mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the three type of schemes.</p>
</sec>
</abstract>
<funding-group>
<award-group award-type="grant">
<funding-source>
<named-content content-type="funder-name">Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative</named-content>
</funding-source>
<award-id>GBMF4554</award-id>
</award-group>
<award-group award-type="grant">
<funding-source>
<named-content content-type="funder-name">US National Science Foundation</named-content>
</funding-source>
<award-id>CCF-1256087</award-id>
<award-id>CCF-1319998</award-id>
</award-group>
<award-group award-type="grant">
<funding-source>
<named-content content-type="funder-name">US National Institutes of Health</named-content>
</funding-source>
<award-id>R01HG007104</award-id>
<award-id>R01GM122935</award-id>
</award-group>
</funding-group>
<counts>
<page-count count="10"></page-count>
</counts>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000B22 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000B22 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:6037127
   |texte=   Asymptotically optimal minimizers schemes
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:29949995" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021