Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A study on the application of topic models to motif finding algorithms

Identifieur interne : 000263 ( Pmc/Curation ); précédent : 000262; suivant : 000264

A study on the application of topic models to motif finding algorithms

Auteurs : Josep Basha Gutierrez [Japon] ; Kenta Nakai [Japon]

Source :

RBID : PMC:5259985

Abstract

Background

Topic models are statistical algorithms which try to discover the structure of a set of documents according to the abstract topics contained in them. Here we try to apply this approach to the discovery of the structure of the transcription factor binding sites (TFBS) contained in a set of biological sequences, which is a fundamental problem in molecular biology research for the understanding of transcriptional regulation. Here we present two methods that make use of topic models for motif finding. First, we developed an algorithm in which first a set of biological sequences are treated as text documents, and the k-mers contained in them as words, to then build a correlated topic model (CTM) and iteratively reduce its perplexity. We also used the perplexity measurement of CTMs to improve our previous algorithm based on a genetic algorithm and several statistical coefficients.

Results

The algorithms were tested with 56 data sets from four different species and compared to 14 other methods by the use of several coefficients both at nucleotide and site level. The results of our first approach showed a performance comparable to the other methods studied, especially at site level and in sensitivity scores, in which it scored better than any of the 14 existing tools. In the case of our previous algorithm, the new approach with the addition of the perplexity measurement clearly outperformed all of the other methods in sensitivity, both at nucleotide and site level, and in overall performance at site level.

Conclusions

The statistics obtained show that the performance of a motif finding method based on the use of a CTM is satisfying enough to conclude that the application of topic models is a valid method for developing motif finding algorithms. Moreover, the addition of topic models to a previously developed method dramatically increased its performance, suggesting that this combined algorithm can be a useful tool to successfully predict motifs in different kinds of sets of DNA sequences.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1364-3) contains supplementary material, which is available to authorized users.


Url:
DOI: 10.1186/s12859-016-1364-3
PubMed: 28155646
PubMed Central: 5259985

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:5259985

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A study on the application of topic models to motif finding algorithms</title>
<author>
<name sortKey="Basha Gutierrez, Josep" sort="Basha Gutierrez, Josep" uniqKey="Basha Gutierrez J" first="Josep" last="Basha Gutierrez">Josep Basha Gutierrez</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2151 536X</institution-id>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution>Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences,</institution>
<institution>The University of Tokyo,</institution>
</institution-wrap>
277-8561 Chiba, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>277-8561 Chiba</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2151 536X</institution-id>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution>Human Genome Center, The Institute of Medical Science,</institution>
<institution>The University of Tokyo,</institution>
</institution-wrap>
4-6-1 Shirokane-dai, Minato-ku, 108-8639 Tokyo, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>4-6-1 Shirokane-dai, Minato-ku, 108-8639 Tokyo</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Nakai, Kenta" sort="Nakai, Kenta" uniqKey="Nakai K" first="Kenta" last="Nakai">Kenta Nakai</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2151 536X</institution-id>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution>Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences,</institution>
<institution>The University of Tokyo,</institution>
</institution-wrap>
277-8561 Chiba, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>277-8561 Chiba</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2151 536X</institution-id>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution>Human Genome Center, The Institute of Medical Science,</institution>
<institution>The University of Tokyo,</institution>
</institution-wrap>
4-6-1 Shirokane-dai, Minato-ku, 108-8639 Tokyo, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>4-6-1 Shirokane-dai, Minato-ku, 108-8639 Tokyo</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28155646</idno>
<idno type="pmc">5259985</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5259985</idno>
<idno type="RBID">PMC:5259985</idno>
<idno type="doi">10.1186/s12859-016-1364-3</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000263</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000263</idno>
<idno type="wicri:Area/Pmc/Curation">000263</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000263</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">A study on the application of topic models to motif finding algorithms</title>
<author>
<name sortKey="Basha Gutierrez, Josep" sort="Basha Gutierrez, Josep" uniqKey="Basha Gutierrez J" first="Josep" last="Basha Gutierrez">Josep Basha Gutierrez</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2151 536X</institution-id>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution>Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences,</institution>
<institution>The University of Tokyo,</institution>
</institution-wrap>
277-8561 Chiba, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>277-8561 Chiba</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2151 536X</institution-id>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution>Human Genome Center, The Institute of Medical Science,</institution>
<institution>The University of Tokyo,</institution>
</institution-wrap>
4-6-1 Shirokane-dai, Minato-ku, 108-8639 Tokyo, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>4-6-1 Shirokane-dai, Minato-ku, 108-8639 Tokyo</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Nakai, Kenta" sort="Nakai, Kenta" uniqKey="Nakai K" first="Kenta" last="Nakai">Kenta Nakai</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2151 536X</institution-id>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution>Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences,</institution>
<institution>The University of Tokyo,</institution>
</institution-wrap>
277-8561 Chiba, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>277-8561 Chiba</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2151 536X</institution-id>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution>Human Genome Center, The Institute of Medical Science,</institution>
<institution>The University of Tokyo,</institution>
</institution-wrap>
4-6-1 Shirokane-dai, Minato-ku, 108-8639 Tokyo, Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>4-6-1 Shirokane-dai, Minato-ku, 108-8639 Tokyo</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Topic models are statistical algorithms which try to discover the structure of a set of documents according to the abstract topics contained in them. Here we try to apply this approach to the discovery of the structure of the transcription factor binding sites (TFBS) contained in a set of biological sequences, which is a fundamental problem in molecular biology research for the understanding of transcriptional regulation. Here we present two methods that make use of topic models for motif finding. First, we developed an algorithm in which first a set of biological sequences are treated as text documents, and the k-mers contained in them as words, to then build a correlated topic model (CTM) and iteratively reduce its perplexity. We also used the perplexity measurement of CTMs to improve our previous algorithm based on a genetic algorithm and several statistical coefficients.</p>
</sec>
<sec>
<title>Results</title>
<p>The algorithms were tested with 56 data sets from four different species and compared to 14 other methods by the use of several coefficients both at nucleotide and site level. The results of our first approach showed a performance comparable to the other methods studied, especially at site level and in sensitivity scores, in which it scored better than any of the 14 existing tools. In the case of our previous algorithm, the new approach with the addition of the perplexity measurement clearly outperformed all of the other methods in sensitivity, both at nucleotide and site level, and in overall performance at site level.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>The statistics obtained show that the performance of a motif finding method based on the use of a CTM is satisfying enough to conclude that the application of topic models is a valid method for developing motif finding algorithms. Moreover, the addition of topic models to a previously developed method dramatically increased its performance, suggesting that this combined algorithm can be a useful tool to successfully predict motifs in different kinds of sets of DNA sequences.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12859-016-1364-3) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Tompa, M" uniqKey="Tompa M">M Tompa</name>
</author>
<author>
<name sortKey="Li, N" uniqKey="Li N">N Li</name>
</author>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Church, Gm" uniqKey="Church G">GM Church</name>
</author>
<author>
<name sortKey="De Moor, B" uniqKey="De Moor B">B De Moor</name>
</author>
<author>
<name sortKey="Eskin, E" uniqKey="Eskin E">E Eskin</name>
</author>
<author>
<name sortKey="Favorov, Av" uniqKey="Favorov A">AV Favorov</name>
</author>
<author>
<name sortKey="Frith, Mc" uniqKey="Frith M">MC Frith</name>
</author>
<author>
<name sortKey="Fu, Y" uniqKey="Fu Y">Y Fu</name>
</author>
<author>
<name sortKey="Kent, Wj" uniqKey="Kent W">WJ Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Das, Mk" uniqKey="Das M">MK Das</name>
</author>
<author>
<name sortKey="Dai, Hk" uniqKey="Dai H">HK Dai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blei, Dm" uniqKey="Blei D">DM Blei</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blei, Dm" uniqKey="Blei D">DM Blei</name>
</author>
<author>
<name sortKey="Ng, Ay" uniqKey="Ng A">AY Ng</name>
</author>
<author>
<name sortKey="Jordan, Mi" uniqKey="Jordan M">MI Jordan</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mitchell, M" uniqKey="Mitchell M">M Mitchell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blei, D" uniqKey="Blei D">D Blei</name>
</author>
<author>
<name sortKey="Lafferty, J" uniqKey="Lafferty J">J Lafferty</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aitchison, J" uniqKey="Aitchison J">J Aitchison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hornik, K" uniqKey="Hornik K">K Hornik</name>
</author>
<author>
<name sortKey="Grun, B" uniqKey="Grun B">B Grün</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Abnizova, I" uniqKey="Abnizova I">I Abnizova</name>
</author>
<author>
<name sortKey="Te Boekhorst, R" uniqKey="Te Boekhorst R">R te Boekhorst</name>
</author>
<author>
<name sortKey="Walter, K" uniqKey="Walter K">K Walter</name>
</author>
<author>
<name sortKey="Gilks, Wr" uniqKey="Gilks W">WR Gilks</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shu, Jj" uniqKey="Shu J">JJ Shu</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Favorov, Av" uniqKey="Favorov A">AV Favorov</name>
</author>
<author>
<name sortKey="Gelfand, Ms" uniqKey="Gelfand M">MS Gelfand</name>
</author>
<author>
<name sortKey="Gerasimova, Av" uniqKey="Gerasimova A">AV Gerasimova</name>
</author>
<author>
<name sortKey="Mironov, Aa" uniqKey="Mironov A">AA Mironov</name>
</author>
<author>
<name sortKey="Makeev, Vj" uniqKey="Makeev V">VJ Makeev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pavesi, G" uniqKey="Pavesi G">G Pavesi</name>
</author>
<author>
<name sortKey="Mereghetti, P" uniqKey="Mereghetti P">P Mereghetti</name>
</author>
<author>
<name sortKey="Mauri, G" uniqKey="Mauri G">G Mauri</name>
</author>
<author>
<name sortKey="Pesole, G" uniqKey="Pesole G">G Pesole</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sinha, S" uniqKey="Sinha S">S Sinha</name>
</author>
<author>
<name sortKey="Tompa, M" uniqKey="Tompa M">M Tompa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author>
<name sortKey="Sze, Sh" uniqKey="Sze S">SH Sze</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Burset, M" uniqKey="Burset M">M Burset</name>
</author>
<author>
<name sortKey="Guigo, R" uniqKey="Guigo R">R Guigo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wingender, E" uniqKey="Wingender E">E Wingender</name>
</author>
<author>
<name sortKey="Dietze, P" uniqKey="Dietze P">P Dietze</name>
</author>
<author>
<name sortKey="Karas, H" uniqKey="Karas H">H Karas</name>
</author>
<author>
<name sortKey="Knuppel, R" uniqKey="Knuppel R">R Knüppel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hughes, Jd" uniqKey="Hughes J">JD Hughes</name>
</author>
<author>
<name sortKey="Estep, Pw" uniqKey="Estep P">PW Estep</name>
</author>
<author>
<name sortKey="Tavazoie, S" uniqKey="Tavazoie S">S Tavazoie</name>
</author>
<author>
<name sortKey="Church, Gm" uniqKey="Church G">GM Church</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Workman, Ct" uniqKey="Workman C">CT Workman</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hertz, Gz" uniqKey="Hertz G">GZ Hertz</name>
</author>
<author>
<name sortKey="Stormo, Gd" uniqKey="Stormo G">GD Stormo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Frith, Mc" uniqKey="Frith M">MC Frith</name>
</author>
<author>
<name sortKey="Hansen, U" uniqKey="Hansen U">U Hansen</name>
</author>
<author>
<name sortKey="Spouge, Jl" uniqKey="Spouge J">JL Spouge</name>
</author>
<author>
<name sortKey="Weng, Z" uniqKey="Weng Z">Z Weng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ao, W" uniqKey="Ao W">W Ao</name>
</author>
<author>
<name sortKey="Gaudet, J" uniqKey="Gaudet J">J Gaudet</name>
</author>
<author>
<name sortKey="Kent, Wj" uniqKey="Kent W">WJ Kent</name>
</author>
<author>
<name sortKey="Muttumu, S" uniqKey="Muttumu S">S Muttumu</name>
</author>
<author>
<name sortKey="Mango, Se" uniqKey="Mango S">SE Mango</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Elkan, C" uniqKey="Elkan C">C Elkan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eskin, E" uniqKey="Eskin E">E Eskin</name>
</author>
<author>
<name sortKey="Pevzner, P" uniqKey="Pevzner P">P Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thijs, G" uniqKey="Thijs G">G Thijs</name>
</author>
<author>
<name sortKey="Lescot, M" uniqKey="Lescot M">M Lescot</name>
</author>
<author>
<name sortKey="Marchal, K" uniqKey="Marchal K">K Marchal</name>
</author>
<author>
<name sortKey="Rombauts, S" uniqKey="Rombauts S">S Rombauts</name>
</author>
<author>
<name sortKey="De Moor, B" uniqKey="De Moor B">B De Moor</name>
</author>
<author>
<name sortKey="Rouze, P" uniqKey="Rouze P">P Rouze</name>
</author>
<author>
<name sortKey="Moreau, Y" uniqKey="Moreau Y">Y Moreau</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Helden, J" uniqKey="Van Helden J">J van Helden</name>
</author>
<author>
<name sortKey="Andre, B" uniqKey="Andre B">B Andre</name>
</author>
<author>
<name sortKey="Collado Vides, J" uniqKey="Collado Vides J">J Collado-Vides</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Helden, J" uniqKey="Van Helden J">J van Helden</name>
</author>
<author>
<name sortKey="Rios, Af" uniqKey="Rios A">AF Rios</name>
</author>
<author>
<name sortKey="Collado Vides, J" uniqKey="Collado Vides J">J Collado-Vides</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Regnier, M" uniqKey="Regnier M">M Régnier</name>
</author>
<author>
<name sortKey="Denise, A" uniqKey="Denise A">A Denise</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28155646</article-id>
<article-id pub-id-type="pmc">5259985</article-id>
<article-id pub-id-type="publisher-id">1364</article-id>
<article-id pub-id-type="doi">10.1186/s12859-016-1364-3</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A study on the application of topic models to motif finding algorithms</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Basha Gutierrez</surname>
<given-names>Josep</given-names>
</name>
<address>
<email>yusef@hgc.jp</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Nakai</surname>
<given-names>Kenta</given-names>
</name>
<address>
<email>knakai@ims.u-tokyo.ac.jp</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2151 536X</institution-id>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution>Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences,</institution>
<institution>The University of Tokyo,</institution>
</institution-wrap>
277-8561 Chiba, Japan</aff>
<aff id="Aff2">
<label>2</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2151 536X</institution-id>
<institution-id institution-id-type="GRID">grid.26999.3d</institution-id>
<institution>Human Genome Center, The Institute of Medical Science,</institution>
<institution>The University of Tokyo,</institution>
</institution-wrap>
4-6-1 Shirokane-dai, Minato-ku, 108-8639 Tokyo, Japan</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>22</day>
<month>12</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>22</day>
<month>12</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>17</volume>
<issue>Suppl 19</issue>
<issue-sponsor>Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. The Supplement Editors declare that they have no competing interests, except CS who was co-author of one article in this supplement, which was managed by the other Supplement Editors.</issue-sponsor>
<elocation-id>502</elocation-id>
<permissions>
<copyright-statement>© The Author(s). 2016</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>Topic models are statistical algorithms which try to discover the structure of a set of documents according to the abstract topics contained in them. Here we try to apply this approach to the discovery of the structure of the transcription factor binding sites (TFBS) contained in a set of biological sequences, which is a fundamental problem in molecular biology research for the understanding of transcriptional regulation. Here we present two methods that make use of topic models for motif finding. First, we developed an algorithm in which first a set of biological sequences are treated as text documents, and the k-mers contained in them as words, to then build a correlated topic model (CTM) and iteratively reduce its perplexity. We also used the perplexity measurement of CTMs to improve our previous algorithm based on a genetic algorithm and several statistical coefficients.</p>
</sec>
<sec>
<title>Results</title>
<p>The algorithms were tested with 56 data sets from four different species and compared to 14 other methods by the use of several coefficients both at nucleotide and site level. The results of our first approach showed a performance comparable to the other methods studied, especially at site level and in sensitivity scores, in which it scored better than any of the 14 existing tools. In the case of our previous algorithm, the new approach with the addition of the perplexity measurement clearly outperformed all of the other methods in sensitivity, both at nucleotide and site level, and in overall performance at site level.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>The statistics obtained show that the performance of a motif finding method based on the use of a CTM is satisfying enough to conclude that the application of topic models is a valid method for developing motif finding algorithms. Moreover, the addition of topic models to a previously developed method dramatically increased its performance, suggesting that this combined algorithm can be a useful tool to successfully predict motifs in different kinds of sets of DNA sequences.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12859-016-1364-3) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<conference>
<conf-name>15th International Conference On Bioinformatics (INCOB 2016)</conf-name>
<conf-acronym>InCOB 2016</conf-acronym>
<conf-loc>Queenstown, Singapore</conf-loc>
<conf-date>21-23 September 2016</conf-date>
<string-conf>
<uri>http://incob16.apbionet.org/</uri>
</string-conf>
</conference>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2016</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000263 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000263 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:5259985
   |texte=   A study on the application of topic models to motif finding algorithms
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:28155646" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021