Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Joker de Bruijn: Covering k-Mers Using Joker Characters

Identifieur interne : 000D97 ( Pmc/Curation ); précédent : 000D96; suivant : 000D98

Joker de Bruijn: Covering k-Mers Using Joker Characters

Auteurs : Yaron Orenstein ; Yun William Yu ; Bonnie Berger

Source :

RBID : PMC:6247992

Abstract

Abstract

Sequence libraries that cover all k-mers enable universal and unbiased measurements of nucleotide and peptide binding. The shortest sequence to cover all k-mers is a de Bruijn sequence of length \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\vert \Sigma { \vert ^k} + k - 1$$ \end{document}. Researchers would like to increase k to measure interactions at greater detail, but face a challenging problem: the number of k-mers grows exponentially in k, while the space on the experimental device is limited. In this study, we introduce a novel advance to shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet. Theoretically, the use of joker characters can reduce the library size tremendously, but it should be limited as the introduced degeneracy lowers the statistical robustness of measurements. In this work, we consider the problem of generating a minimum-length sequence that covers a given set of k-mers using joker characters. The number and positions of the joker characters are provided as input. We first prove that the problem is NP-hard. We then present the first solution to the problem, which is based on two algorithmic innovations: (1) a greedy heuristic and (2) an integer linear programming (ILP) formulation. We first run the heuristic to find a good feasible solution, and then run an ILP solver to improve it. We ran our algorithm on DNA and amino acid alphabets to cover all k-mers for different values of k and k-mer multiplicity. Results demonstrate that it produces sequences that are very close to the theoretical lower bound.


Url:
DOI: 10.1089/cmb.2018.0032
PubMed: 30117747
PubMed Central: 6247992

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:6247992

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Joker de Bruijn: Covering
<italic>k</italic>
-Mers Using Joker Characters</title>
<author>
<name sortKey="Orenstein, Yaron" sort="Orenstein, Yaron" uniqKey="Orenstein Y" first="Yaron" last="Orenstein">Yaron Orenstein</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Yu, Yun William" sort="Yu, Yun William" uniqKey="Yu Y" first="Yun William" last="Yu">Yun William Yu</name>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Berger, Bonnie" sort="Berger, Bonnie" uniqKey="Berger B" first="Bonnie" last="Berger">Bonnie Berger</name>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">30117747</idno>
<idno type="pmc">6247992</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247992</idno>
<idno type="RBID">PMC:6247992</idno>
<idno type="doi">10.1089/cmb.2018.0032</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">000D97</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000D97</idno>
<idno type="wicri:Area/Pmc/Curation">000D97</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000D97</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Joker de Bruijn: Covering
<italic>k</italic>
-Mers Using Joker Characters</title>
<author>
<name sortKey="Orenstein, Yaron" sort="Orenstein, Yaron" uniqKey="Orenstein Y" first="Yaron" last="Orenstein">Yaron Orenstein</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Yu, Yun William" sort="Yu, Yun William" uniqKey="Yu Y" first="Yun William" last="Yu">Yun William Yu</name>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Berger, Bonnie" sort="Berger, Bonnie" uniqKey="Berger B" first="Bonnie" last="Berger">Bonnie Berger</name>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of Computational Biology</title>
<idno type="ISSN">1066-5277</idno>
<idno type="eISSN">1557-8666</idno>
<imprint>
<date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<p>
<bold>Sequence libraries that cover all
<italic>k</italic>
-mers enable universal and unbiased measurements of nucleotide and peptide binding. The shortest sequence to cover all
<italic>k</italic>
-mers is a de Bruijn sequence of length
<inline-formula>
<tex-math id="eq1" notation="LaTeX">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\vert \Sigma { \vert ^k} + k - 1$$ \end{document}</tex-math>
</inline-formula>
. Researchers would like to increase
<italic>k</italic>
to measure interactions at greater detail, but face a challenging problem: the number of
<italic>k</italic>
-mers grows exponentially in
<italic>k</italic>
, while the space on the experimental device is limited. In this study, we introduce a novel advance to shrink
<italic>k</italic>
-mer library sizes by using joker characters, which represent all characters in the alphabet. Theoretically, the use of joker characters can reduce the library size tremendously, but it should be limited as the introduced degeneracy lowers the statistical robustness of measurements. In this work, we consider the problem of generating a minimum-length sequence that covers a given set of
<italic>k</italic>
-mers using joker characters. The number and positions of the joker characters are provided as input. We first prove that the problem is NP-hard. We then present the first solution to the problem, which is based on two algorithmic innovations: (1) a greedy heuristic and (2) an integer linear programming (ILP) formulation. We first run the heuristic to find a good feasible solution, and then run an ILP solver to improve it. We ran our algorithm on DNA and amino acid alphabets to cover all
<italic>k</italic>
-mers for different values of
<italic>k</italic>
and
<italic>k</italic>
-mer multiplicity. Results demonstrate that it produces sequences that are very close to the theoretical lower bound.</bold>
</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Berger, M F" uniqKey="Berger M">M.F. Berger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blanchet Sadri, F" uniqKey="Blanchet Sadri F">F. Blanchet-Sadri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, H Z" uniqKey="Chen H">H.Z. Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="D Addario, M" uniqKey="D Addario M">M. D'Addario</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fordyce, P M" uniqKey="Fordyce P">P.M. Fordyce</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gurard Levin, Z A" uniqKey="Gurard Levin Z">Z.A. Gurard-Levin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Orenstein, Y" uniqKey="Orenstein Y">Y. Orenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Orenstein, Y" uniqKey="Orenstein Y">Y. Orenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Philippakis, A A" uniqKey="Philippakis A">A.A. Philippakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="R Ih, K J" uniqKey="R Ih K">K.-J. Räihä</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ray, D" uniqKey="Ray D">D. Ray</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, R P" uniqKey="Smith R">R.P. Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vassilevska, V" uniqKey="Vassilevska V">V. Vassilevska</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wyatt, B J" uniqKey="Wyatt B">B.J. Wyatt</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">J Comput Biol</journal-id>
<journal-id journal-id-type="iso-abbrev">J. Comput. Biol</journal-id>
<journal-id journal-id-type="publisher-id">cmb</journal-id>
<journal-title-group>
<journal-title>Journal of Computational Biology</journal-title>
</journal-title-group>
<issn pub-type="ppub">1066-5277</issn>
<issn pub-type="epub">1557-8666</issn>
<publisher>
<publisher-name>Mary Ann Liebert, Inc., publishers</publisher-name>
<publisher-loc>140 Huguenot Street, 3rd FloorNew Rochelle, NY 10801USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">30117747</article-id>
<article-id pub-id-type="pmc">6247992</article-id>
<article-id pub-id-type="publisher-id">10.1089/cmb.2018.0032</article-id>
<article-id pub-id-type="doi">10.1089/cmb.2018.0032</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Joker de Bruijn: Covering
<italic>k</italic>
-Mers Using Joker Characters</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Orenstein</surname>
<given-names>Yaron</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1,</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yu</surname>
<given-names>Yun William</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Berger</surname>
<given-names>Bonnie</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2,</sup>
</xref>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<xref ref-type="corresp" rid="corr1"></xref>
</contrib>
<aff id="aff1">
<label>
<sup>1</sup>
</label>
Department of Electrical and Computer Engineering,
<institution>Ben-Gurion University of the Negev</institution>
, Beer-Sheva,
<country>Israel</country>
.</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>
Computer Science and Artificial Intelligence Laboratory,
<institution>Massachusetts Institute of Technology</institution>
, Cambridge, Massachusetts.</aff>
<aff id="aff3">
<label>
<sup>3</sup>
</label>
Department of Mathematics,
<institution>Massachusetts Institute of Technology</institution>
, Cambridge, Massachusetts.</aff>
</contrib-group>
<author-notes>
<corresp id="corr1">Address correspondence to: Prof. Bonnie Berger, Department of Mathematics, Massachusetts Institute of Technology, 77 Mass Avenue, 2-373, Cambridge, MA 02139
<email>bab@mit.edu</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<day>01</day>
<month>11</month>
<year>2018</year>
<pmc-comment>string-date: November 2018</pmc-comment>
</pub-date>
<pub-date pub-type="epub">
<day>09</day>
<month>11</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>09</day>
<month>11</month>
<year>2018</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>25</volume>
<issue>11</issue>
<fpage>1171</fpage>
<lpage>1178</lpage>
<permissions>
<copyright-statement>© Yaron Orenstein, et al., 2018. Published by Mary Ann Liebert, Inc.</copyright-statement>
<copyright-year>2018</copyright-year>
<license license-type="open-access">
<license-p>This Open Access article is distributed under the terms of the Creative Commons Attribution Noncommercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">http://creativecommons.org/licenses/by-nc/4.0/</ext-link>
) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="cmb.2018.0032.pdf"></self-uri>
<abstract>
<title>Abstract</title>
<p>
<bold>Sequence libraries that cover all
<italic>k</italic>
-mers enable universal and unbiased measurements of nucleotide and peptide binding. The shortest sequence to cover all
<italic>k</italic>
-mers is a de Bruijn sequence of length
<inline-formula>
<tex-math id="eq1" notation="LaTeX">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\vert \Sigma { \vert ^k} + k - 1$$ \end{document}</tex-math>
</inline-formula>
. Researchers would like to increase
<italic>k</italic>
to measure interactions at greater detail, but face a challenging problem: the number of
<italic>k</italic>
-mers grows exponentially in
<italic>k</italic>
, while the space on the experimental device is limited. In this study, we introduce a novel advance to shrink
<italic>k</italic>
-mer library sizes by using joker characters, which represent all characters in the alphabet. Theoretically, the use of joker characters can reduce the library size tremendously, but it should be limited as the introduced degeneracy lowers the statistical robustness of measurements. In this work, we consider the problem of generating a minimum-length sequence that covers a given set of
<italic>k</italic>
-mers using joker characters. The number and positions of the joker characters are provided as input. We first prove that the problem is NP-hard. We then present the first solution to the problem, which is based on two algorithmic innovations: (1) a greedy heuristic and (2) an integer linear programming (ILP) formulation. We first run the heuristic to find a good feasible solution, and then run an ILP solver to improve it. We ran our algorithm on DNA and amino acid alphabets to cover all
<italic>k</italic>
-mers for different values of
<italic>k</italic>
and
<italic>k</italic>
-mer multiplicity. Results demonstrate that it produces sequences that are very close to the theoretical lower bound.</bold>
</p>
</abstract>
<kwd-group kwd-group-type="author">
<title>
<bold>Keywords:</bold>
</title>
<kwd>de Bruijn sequence</kwd>
<kwd>microarray library design</kwd>
<kwd>peptide arrays</kwd>
<kwd>protein binding</kwd>
<kwd>protein binding microarrays</kwd>
</kwd-group>
<counts>
<table-count count="4"></table-count>
<equation-count count="5"></equation-count>
<ref-count count="15"></ref-count>
<page-count count="8"></page-count>
</counts>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D97 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000D97 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:6247992
   |texte=   Joker de Bruijn: Covering k-Mers Using Joker Characters
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:30117747" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021