Serveur d'exploration autour du libre accès en Belgique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations

Identifieur interne : 000308 ( Pmc/Corpus ); précédent : 000307; suivant : 000309

Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations

Auteurs : Sofie Van Landeghem ; Kai Hakala ; Samuel Rönnqvist ; Tapio Salakoski ; Yves Van De Peer ; Filip Ginter

Source :

RBID : PMC:3375141

Abstract

Technological advancements in the field of genetics have led not only to an abundance of experimental data, but also caused an exponential increase of the number of published biomolecular studies. Text mining is widely accepted as a promising technique to help researchers in the life sciences deal with the amount of available literature. This paper presents a freely available web application built on top of 21.3 million detailed biomolecular events extracted from all PubMed abstracts. These text mining results were generated by a state-of-the-art event extraction system and enriched with gene family associations and abstract generalizations, accounting for lexical variants and synonymy. The EVEX resource locates relevant literature on phosphorylation, regulation targets, binding partners, and several other biomolecular events and assigns confidence values to these events. The search function accepts official gene/protein symbols as well as common names from all species. Finally, the web application is a powerful tool for generating homology-based hypotheses as well as novel, indirect associations between genes and proteins such as coregulators.


Url:
DOI: 10.1155/2012/582765
PubMed: 22719757
PubMed Central: 3375141

Links to Exploration step

PMC:3375141

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations</title>
<author>
<name sortKey="Van Landeghem, Sofie" sort="Van Landeghem, Sofie" uniqKey="Van Landeghem S" first="Sofie" last="Van Landeghem">Sofie Van Landeghem</name>
<affiliation>
<nlm:aff id="I1">Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Gent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hakala, Kai" sort="Hakala, Kai" uniqKey="Hakala K" first="Kai" last="Hakala">Kai Hakala</name>
<affiliation>
<nlm:aff id="I3">Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ronnqvist, Samuel" sort="Ronnqvist, Samuel" uniqKey="Ronnqvist S" first="Samuel" last="Rönnqvist">Samuel Rönnqvist</name>
<affiliation>
<nlm:aff id="I3">Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Salakoski, Tapio" sort="Salakoski, Tapio" uniqKey="Salakoski T" first="Tapio" last="Salakoski">Tapio Salakoski</name>
<affiliation>
<nlm:aff id="I3">Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I4">Turku BioNLP Group, Turku Centre for Computer Science (TUCS), Joukahaisenkatu 3-5, 20520 Turku, Finland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Van De Peer, Yves" sort="Van De Peer, Yves" uniqKey="Van De Peer Y" first="Yves" last="Van De Peer">Yves Van De Peer</name>
<affiliation>
<nlm:aff id="I1">Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Gent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ginter, Filip" sort="Ginter, Filip" uniqKey="Ginter F" first="Filip" last="Ginter">Filip Ginter</name>
<affiliation>
<nlm:aff id="I3">Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22719757</idno>
<idno type="pmc">3375141</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3375141</idno>
<idno type="RBID">PMC:3375141</idno>
<idno type="doi">10.1155/2012/582765</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000308</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations</title>
<author>
<name sortKey="Van Landeghem, Sofie" sort="Van Landeghem, Sofie" uniqKey="Van Landeghem S" first="Sofie" last="Van Landeghem">Sofie Van Landeghem</name>
<affiliation>
<nlm:aff id="I1">Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Gent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hakala, Kai" sort="Hakala, Kai" uniqKey="Hakala K" first="Kai" last="Hakala">Kai Hakala</name>
<affiliation>
<nlm:aff id="I3">Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ronnqvist, Samuel" sort="Ronnqvist, Samuel" uniqKey="Ronnqvist S" first="Samuel" last="Rönnqvist">Samuel Rönnqvist</name>
<affiliation>
<nlm:aff id="I3">Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Salakoski, Tapio" sort="Salakoski, Tapio" uniqKey="Salakoski T" first="Tapio" last="Salakoski">Tapio Salakoski</name>
<affiliation>
<nlm:aff id="I3">Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I4">Turku BioNLP Group, Turku Centre for Computer Science (TUCS), Joukahaisenkatu 3-5, 20520 Turku, Finland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Van De Peer, Yves" sort="Van De Peer, Yves" uniqKey="Van De Peer Y" first="Yves" last="Van De Peer">Yves Van De Peer</name>
<affiliation>
<nlm:aff id="I1">Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Gent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ginter, Filip" sort="Ginter, Filip" uniqKey="Ginter F" first="Filip" last="Ginter">Filip Ginter</name>
<affiliation>
<nlm:aff id="I3">Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Advances in Bioinformatics</title>
<idno type="ISSN">1687-8027</idno>
<idno type="eISSN">1687-8035</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Technological advancements in the field of genetics have led not only to an abundance of experimental data, but also caused an exponential increase of the number of published biomolecular studies. Text mining is widely accepted as a promising technique to help researchers in the life sciences deal with the amount of available literature. This paper presents a freely available web application built on top of 21.3 million detailed biomolecular events extracted from all PubMed abstracts. These text mining results were generated by a state-of-the-art event extraction system and enriched with gene family associations and abstract generalizations, accounting for lexical variants and synonymy. The EVEX resource locates relevant literature on phosphorylation, regulation targets, binding partners, and several other biomolecular events and assigns confidence values to these events. The search function accepts official gene/protein symbols as well as common names from all species. Finally, the web application is a powerful tool for generating homology-based hypotheses as well as novel, indirect associations between genes and proteins such as coregulators. </p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, J D" uniqKey="Kim J">J-D Kim</name>
</author>
<author>
<name sortKey="Ohta, T" uniqKey="Ohta T">T Ohta</name>
</author>
<author>
<name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author>
<name sortKey="Kano, Y" uniqKey="Kano Y">Y Kano</name>
</author>
<author>
<name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, J D" uniqKey="Kim J">J-D Kim</name>
</author>
<author>
<name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author>
<name sortKey="Ohta, T" uniqKey="Ohta T">T Ohta</name>
</author>
<author>
<name sortKey="Bossy N Nguyen, R" uniqKey="Bossy N Nguyen R">R Bossy, N. Nguyen</name>
</author>
<author>
<name sortKey="Nguyen, N" uniqKey="Nguyen N">N Nguyen</name>
</author>
<author>
<name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Homann, R" uniqKey="Homann R">R Homann</name>
</author>
<author>
<name sortKey="Valencia, A" uniqKey="Valencia A">A Valencia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ohta, T" uniqKey="Ohta T">T Ohta</name>
</author>
<author>
<name sortKey="Miyao, Y" uniqKey="Miyao Y">Y Miyao</name>
</author>
<author>
<name sortKey="Ninomiya, T" uniqKey="Ninomiya T">T Ninomiya</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rebholz Schuhmann, D" uniqKey="Rebholz Schuhmann D">D Rebholz-Schuhmann</name>
</author>
<author>
<name sortKey="Kirsch, H" uniqKey="Kirsch H">H Kirsch</name>
</author>
<author>
<name sortKey="Arregui, M" uniqKey="Arregui M">M Arregui</name>
</author>
<author>
<name sortKey="Gaudan, S" uniqKey="Gaudan S">S Gaudan</name>
</author>
<author>
<name sortKey="Riethoven, M" uniqKey="Riethoven M">M Riethoven</name>
</author>
<author>
<name sortKey="Stoehr, P" uniqKey="Stoehr P">P Stoehr</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hearst, Ma" uniqKey="Hearst M">MA Hearst</name>
</author>
<author>
<name sortKey="Divoli, A" uniqKey="Divoli A">A Divoli</name>
</author>
<author>
<name sortKey="Guturu, Hh" uniqKey="Guturu H">HH Guturu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xu, S" uniqKey="Xu S">S Xu</name>
</author>
<author>
<name sortKey="Mccusker, J" uniqKey="Mccusker J">J McCusker</name>
</author>
<author>
<name sortKey="Krauthammer, M" uniqKey="Krauthammer M">M Krauthammer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Agarwal, S" uniqKey="Agarwal S">S Agarwal</name>
</author>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
<author>
<name sortKey="Kohane, I" uniqKey="Kohane I">I Kohane</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bjorne, J" uniqKey="Bjorne J">J Björne</name>
</author>
<author>
<name sortKey="Ginter, F" uniqKey="Ginter F">F Ginter</name>
</author>
<author>
<name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author>
<name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
<author>
<name sortKey="Salakoski, T" uniqKey="Salakoski T">T Salakoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Landeghem, S" uniqKey="Van Landeghem S">S Van Landeghem</name>
</author>
<author>
<name sortKey="Ginter, F" uniqKey="Ginter F">F Ginter</name>
</author>
<author>
<name sortKey="Van De Peer, Y" uniqKey="Van De Peer Y">Y Van de Peer</name>
</author>
<author>
<name sortKey="Salakoski, T" uniqKey="Salakoski T">T Salakoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leaman, R" uniqKey="Leaman R">R Leaman</name>
</author>
<author>
<name sortKey="Gonzalez, G" uniqKey="Gonzalez G">G Gonzalez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bjorne, J" uniqKey="Bjorne J">J Björne</name>
</author>
<author>
<name sortKey="Ginter, F" uniqKey="Ginter F">F Ginter</name>
</author>
<author>
<name sortKey="Salakoski, T" uniqKey="Salakoski T">T Salakoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sayers, Ew" uniqKey="Sayers E">EW Sayers</name>
</author>
<author>
<name sortKey="Barrett, T" uniqKey="Barrett T">T Barrett</name>
</author>
<author>
<name sortKey="Benson, Da" uniqKey="Benson D">DA Benson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Flicek, P" uniqKey="Flicek P">P Flicek</name>
</author>
<author>
<name sortKey="Amode, Mr" uniqKey="Amode M">MR Amode</name>
</author>
<author>
<name sortKey="Barrell, D" uniqKey="Barrell D">D Barrell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kersey, Pj" uniqKey="Kersey P">PJ Kersey</name>
</author>
<author>
<name sortKey="Lawson, D" uniqKey="Lawson D">D Lawson</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Crammer, K" uniqKey="Crammer K">K Crammer</name>
</author>
<author>
<name sortKey="Singer, Y" uniqKey="Singer Y">Y Singer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Segal, E" uniqKey="Segal E">E Segal</name>
</author>
<author>
<name sortKey="Shapira, M" uniqKey="Shapira M">M Shapira</name>
</author>
<author>
<name sortKey="Regev, A" uniqKey="Regev A">A Regev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bjorne, J" uniqKey="Bjorne J">J Björne</name>
</author>
<author>
<name sortKey="Ginter, F" uniqKey="Ginter F">F Ginter</name>
</author>
<author>
<name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author>
<name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
<author>
<name sortKey="Salakoski, T" uniqKey="Salakoski T">T Salakoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kaewphan, S" uniqKey="Kaewphan S">S Kaewphan</name>
</author>
<author>
<name sortKey="Kreula, S" uniqKey="Kreula S">S Kreula</name>
</author>
<author>
<name sortKey="Van Landeghem, S" uniqKey="Van Landeghem S">S Van Landeghem</name>
</author>
<author>
<name sortKey="Van De Peer, Y" uniqKey="Van De Peer Y">Y Van de Peer</name>
</author>
<author>
<name sortKey="Jones, P" uniqKey="Jones P">P Jones</name>
</author>
<author>
<name sortKey="Ginter, F" uniqKey="Ginter F">F Ginter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ohta, T" uniqKey="Ohta T">T Ohta</name>
</author>
<author>
<name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author>
<name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Carballo, Ja" uniqKey="Carballo J">JA Carballo</name>
</author>
<author>
<name sortKey="Cha, Rs" uniqKey="Cha R">RS Cha</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Loewenstein, Y" uniqKey="Loewenstein Y">Y Loewenstein</name>
</author>
<author>
<name sortKey="Raimondo, D" uniqKey="Raimondo D">D Raimondo</name>
</author>
<author>
<name sortKey="Redfern, Oc" uniqKey="Redfern O">OC Redfern</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Proost, S" uniqKey="Proost S">S Proost</name>
</author>
<author>
<name sortKey="Van Bel, M" uniqKey="Van Bel M">M Van Bel</name>
</author>
<author>
<name sortKey="Sterck, L" uniqKey="Sterck L">L Sterck</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kato, R" uniqKey="Kato R">R Kato</name>
</author>
<author>
<name sortKey="Ogawa, H" uniqKey="Ogawa H">H Ogawa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stenetorp, P" uniqKey="Stenetorp P">P Stenetorp</name>
</author>
<author>
<name sortKey="Topi, G" uniqKey="Topi G">G Topić</name>
</author>
<author>
<name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author>
<name sortKey="Ohta, T" uniqKey="Ohta T">T Ohta</name>
</author>
<author>
<name sortKey="Kim, J D" uniqKey="Kim J">J-D Kim</name>
</author>
<author>
<name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bjorne, J" uniqKey="Bjorne J">J Björne</name>
</author>
<author>
<name sortKey="Salakoski, T" uniqKey="Salakoski T">T Salakoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author>
<name sortKey="Ohta, T" uniqKey="Ohta T">T Ohta</name>
</author>
<author>
<name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Landeghem, S" uniqKey="Van Landeghem S">S Van Landeghem</name>
</author>
<author>
<name sortKey="Bjorne, J" uniqKey="Bjorne J">J Björne</name>
</author>
<author>
<name sortKey="Abeel, T" uniqKey="Abeel T">T Abeel</name>
</author>
<author>
<name sortKey="De Baets, B" uniqKey="De Baets B">B De Baets</name>
</author>
<author>
<name sortKey="Salakoski, T" uniqKey="Salakoski T">T Salakoski</name>
</author>
<author>
<name sortKey="Van De Peer, Y" uniqKey="Van De Peer Y">Y Van de Peer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
<author>
<name sortKey="Kao, Hy" uniqKey="Kao H">HY Kao</name>
</author>
<author>
<name sortKey="Wei, Ch" uniqKey="Wei C">CH Wei</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Adv Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">Adv Bioinformatics</journal-id>
<journal-id journal-id-type="publisher-id">ABI</journal-id>
<journal-title-group>
<journal-title>Advances in Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="ppub">1687-8027</issn>
<issn pub-type="epub">1687-8035</issn>
<publisher>
<publisher-name>Hindawi Publishing Corporation</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22719757</article-id>
<article-id pub-id-type="pmc">3375141</article-id>
<article-id pub-id-type="doi">10.1155/2012/582765</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Van Landeghem</surname>
<given-names>Sofie</given-names>
</name>
<xref ref-type="aff" rid="I1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="I2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hakala</surname>
<given-names>Kai</given-names>
</name>
<xref ref-type="aff" rid="I3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Rönnqvist</surname>
<given-names>Samuel</given-names>
</name>
<xref ref-type="aff" rid="I3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Salakoski</surname>
<given-names>Tapio</given-names>
</name>
<xref ref-type="aff" rid="I3">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="I4">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Van de Peer</surname>
<given-names>Yves</given-names>
</name>
<xref ref-type="aff" rid="I1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="I2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ginter</surname>
<given-names>Filip</given-names>
</name>
<xref ref-type="aff" rid="I3">
<sup>3</sup>
</xref>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
</contrib-group>
<aff id="I1">
<sup>1</sup>
Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium</aff>
<aff id="I2">
<sup>2</sup>
Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Gent, Belgium</aff>
<aff id="I3">
<sup>3</sup>
Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland</aff>
<aff id="I4">
<sup>4</sup>
Turku BioNLP Group, Turku Centre for Computer Science (TUCS), Joukahaisenkatu 3-5, 20520 Turku, Finland</aff>
<author-notes>
<corresp id="cor1">*Filip Ginter:
<email>ginter@cs.utu.fi</email>
</corresp>
<fn fn-type="other">
<p>Academic Editor: Jin-Dong Kim</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>6</day>
<month>6</month>
<year>2012</year>
</pub-date>
<volume>2012</volume>
<elocation-id>582765</elocation-id>
<history>
<date date-type="received">
<day>22</day>
<month>11</month>
<year>2011</year>
</date>
<date date-type="rev-recd">
<day>16</day>
<month>3</month>
<year>2012</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>3</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2012 Sofie Van Landeghem et al.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Technological advancements in the field of genetics have led not only to an abundance of experimental data, but also caused an exponential increase of the number of published biomolecular studies. Text mining is widely accepted as a promising technique to help researchers in the life sciences deal with the amount of available literature. This paper presents a freely available web application built on top of 21.3 million detailed biomolecular events extracted from all PubMed abstracts. These text mining results were generated by a state-of-the-art event extraction system and enriched with gene family associations and abstract generalizations, accounting for lexical variants and synonymy. The EVEX resource locates relevant literature on phosphorylation, regulation targets, binding partners, and several other biomolecular events and assigns confidence values to these events. The search function accepts official gene/protein symbols as well as common names from all species. Finally, the web application is a powerful tool for generating homology-based hypotheses as well as novel, indirect associations between genes and proteins such as coregulators. </p>
</abstract>
</article-meta>
</front>
<body>
<sec id="sec1">
<title>1. Introduction</title>
<p>The field of natural language processing for biomolecular texts (BioNLP) aims at large-scale text mining in support of life science research. Its primary motivation is the enormous amount of available scientific literature, which makes it essentially impossible to rapidly gain an overview of prior research results other than in a very narrow domain of interest. Among the typical use cases for BioNLP applications are support for database curation, linking experimental data with relevant literature, content visualization, and hypothesis generation—all of these tasks require processing and summarizing large amounts of individual research articles. Among the most heavily studied tasks in BioNLP is the extraction of information about known associations between biomolecular entities, primarily genes, and gene products, and this task has recently seen much progress in two general directions.</p>
<p>First, relationships between biomolecular entities are now being extracted in much greater detail. Until recently, the focus was on extracting untyped and undirected binary relations which, while stating that there is
<italic>some</italic>
relationship between two objects, gave little additional information about the nature of the relationship. Recognizing that extracting such relations may not provide sufficient detail for wider adoption of text mining in the biomedical community, the focus is currently shifting towards a more detailed analysis of the text, providing additional vital information about the detected relationships. Such information includes the type of the relationship, the specific roles of the arguments (e.g., affector or affectee), the polarity of the relationship (positive versus negative statement), and whether it was stated in a speculative or affirmative context. This more detailed text mining target was formalized as an
<italic>event extraction</italic>
task and greatly popularized in the series of BioNLP Shared Tasks on Event Extraction [
<xref ref-type="bibr" rid="B14">1</xref>
,
<xref ref-type="bibr" rid="B15">2</xref>
]. These shared tasks mark a truly community-wide effort to develop efficient systems to extract sufficiently detailed information for real-world, practical applications, with the highest possible accuracy.</p>
<p>Second, text mining systems are now being applied on a large scale, recognizing the fact that, in order for a text mining service to be adopted by its target audience, that is, researchers in the life sciences, it must cover as much of the available literature as possible. While small-scale studies on well-defined and carefully constructed corpora comprising several hundred abstracts are of great utility to BioNLP research, actual applications of the resulting methods require the processing of considerably larger volumes of text, ideally including all available literature. Numerous studies have been published demonstrating that even complex and computationally intensive methods can be successfully applied on a large scale, typically processing all available abstracts in PubMed and/or all full-text articles in the open-access section of PubMed Central. For instance, the
<italic>iHOP</italic>
[
<xref ref-type="bibr" rid="B10">3</xref>
] and
<italic>Medie</italic>
[
<xref ref-type="bibr" rid="B19">4</xref>
] systems allow users to directly mine literature relevant to given genes or proteins of interest, allowing for structured queries far beyond the usual keyword search.
<italic>EBIMed</italic>
[
<xref ref-type="bibr" rid="B23">5</xref>
] offers a broad scope by also including gene ontology terms such as biological processes, as well as drugs and species names. Other systems, such as the
<italic>BioText search engine</italic>
[
<xref ref-type="bibr" rid="B9">6</xref>
] and
<italic>Yale Image Finder</italic>
[
<xref ref-type="bibr" rid="B30">7</xref>
] allow for a comprehensive search in full-text articles, including also figures and tables. Finally, the
<italic>BioNOT</italic>
system [
<xref ref-type="bibr" rid="B1">8</xref>
] focuses specifically on extracting negative evidence from scientific articles.</p>
<p>The first large-scale application that specifically targets the extraction of detailed events according to their definition in the BioNLP Shared Tasks is the dataset of Björne et al. [
<xref ref-type="bibr" rid="B3">9</xref>
], comprising 19 million events among 36 million gene and protein mentions. This data was obtained by processing all 18 million titles and abstracts in the 2009 PubMed distribution using the winning system of the BioNLP'09 Shared Task. In a subsequent study of Van Landeghem et al. [
<xref ref-type="bibr" rid="B28">10</xref>
], the dataset was refined, generalized, and released as a relational (SQL) database referred to as
<italic>EVEX</italic>
. Among the main contributions of this subsequent study was the generalization of the events, using publicly available gene family definitions. Although a major step forward from the original text-bound events produced by the event extraction system, the main audience for the EVEX database was still the BioNLP community. Consequently, the dataset is not easily accessible for researchers in the life sciences who are not familiar with the intricacies of the event representation. Further, as the massive relational database contains millions of events, manual querying is not an acceptable way to access the data for daily use in life science research.</p>
<p>In this study, we introduce a publicly available web application based on the EVEX dataset, presenting the first application that brings large-scale event-based text mining results to a broad audience of end-users including biologists, geneticists, and other researchers in the life sciences. The web application is available at
<ext-link ext-link-type="uri" xlink:href="http://www.evexdb.org/">http://www.evexdb.org/</ext-link>
. The primary purpose of the application is to provide the EVEX dataset with an intuitive interface that does not presuppose familiarity with the underlying event representation. The application presents a comprehensive and thoroughly interlinked overview of all events for a given gene or protein, or a gene/protein pair. The main novel feature of this application, as compared to other available large-scale text mining applications, is that it covers highly detailed event structures that are enriched with homology-based information and additionally extracts indirect associations by applying cross-document aggregation and combination of events.</p>
<p>In the following section, we provide more details on the EVEX text mining dataset, its text-bound extraction results, and the gene family-based generalizations. Further, we present several novel algorithms for event ranking, event refinement, and retrieval of indirect associations.
<xref ref-type="sec" rid="sec3"> Section 3</xref>
presents an evaluation of the EVEX dataset and the described algorithms. The features of the web application are illustrated in
<xref ref-type="sec" rid="sec4">Section 4</xref>
, presenting a real-world use case on the budding yeast gene
<italic>Mec1</italic>
, which has known mammalian and plant homologs. We conclude by summarizing the main contributions of this work and highlighting several interesting opportunities for future work.</p>
</sec>
<sec id="sec2">
<title>2. Data and Methods</title>
<p>This section describes the original event data, as well as a ranking procedure that sorts events according to their reliability. Further, two abstract layers are defined on top of the complex event structures, enabling coarse grouping of similar events, and providing an intuitive pairwise point of view that allows fast retrieval of interesting gene/protein pairs. Finally, we describe a hypothesis generation module that finds missing links between two entities, allowing the user to retrieve proteins with common binding partners or genes that act as coregulators of a group of common target genes.</p>
<sec sec-type="subsection" id="sec2.1">
<title>2.1. EVEX Dataset</title>
<sec sec-type="subsubsection" id="sec2.1.1">
<title>2.1.1. Core Events</title>
<p>The core set of text mining results accessible through the EVEX resource has been generated by the Turku Event Extraction System, the winning system of the BioNLP'09 Shared Task (ST) on Event Extraction [
<xref ref-type="bibr" rid="B14">1</xref>
]. This extraction system was combined with the BANNER named entity recognizer [
<xref ref-type="bibr" rid="B16">11</xref>
], forming a complete event extraction pipeline that had the highest reported accuracy on the task in 2009, and still remains state-of-the-art, as shown in the recent ST'11 [
<xref ref-type="bibr" rid="B5">12</xref>
]. This event extraction pipeline was applied to all citations in the 2009 distribution of PubMed [
<xref ref-type="bibr" rid="B3">9</xref>
]. As part of the current study, citations from the period 2009–2011 have been processed, using essentially the same pipeline with several minor improvements, resulting in 40.3 million tagged gene symbols and 21.3 million extracted events. The underlying event dataset has thus been brought up to date and will be regularly updated in the future.</p>
<p>The dataset contains events as defined in the context of the ST'09, that is, predicates with a variable number of arguments which can be gene/protein symbols or, recursively, other events. Each argument is defined as having the role of
<italic>Cause</italic>
or
<italic>Theme</italic>
in the event. There are nine distinct event types: binding, phosphorylation, regulation (positive, negative, and unspecified), protein catabolism, transcription, localization, and gene expression. Further, each event refers to a specific
<italic>trigger word</italic>
in text. For example, the word
<italic>increases</italic>
typically triggers a positive regulation event and
<italic>degradation</italic>
typically refers to protein catabolism. An example event structure is illustrated in
<xref ref-type="fig" rid="fig1">Figure 1</xref>
.</p>
<p>Event definitions impose several restrictions on event arguments: (1) events of the type phosphorylation, protein catabolism, transcription, localization, and gene expression must only have a single argument, a Theme, which must be a gene or a protein, (2) events of the binding type may have any number of gene/protein Theme arguments and cannot have a Cause argument, and finally (3) regulation events must have exactly one Theme argument and may have one Cause argument, with no restrictions as to whether these arguments are genes/proteins or recursively other events. In the following text, we will state events using a simple bracketed notation, where the event type is stated first, followed by a comma-separated list of arguments enclosed in parentheses. For instance, the event in
<xref ref-type="fig" rid="fig1">Figure 1</xref>
would be stated as
<italic>Positive-Regulation(C:IL-2, T:Binding(T:NF-κB, T:p55))</italic>
, where
<italic>C:</italic>
and
<italic>T:</italic>
denote the role of the argument as (C)ause or (T)heme. For brevity, we will further refer to all biochemical entities, even proteins and mRNA, as
<italic>genes</italic>
.</p>
</sec>
<sec sec-type="subsubsection" id="sec2.1.2">
<title>2.1.2. Event Generalizations</title>
<p> One of the major limitations of the original core set of events is that they are strictly text-bound and provide no facility for a more general treatment, such as being able to abstract from different name spelling variants and symbol synonymy. Further, biochemical entities were originally treated as merely text strings with no database identity referring to external resources such as UniProt [
<xref ref-type="bibr" rid="B27">13</xref>
] or Entrez Gene [
<xref ref-type="bibr" rid="B24">14</xref>
]. The EVEX dataset addresses these issues by providing event generalizations [
<xref ref-type="bibr" rid="B28">10</xref>
].</p>
<p>First, the identified gene symbols in the EVEX dataset are canonicalized by removing superfluous affixes (prefixes and suffixes) to obtain the core gene symbol, followed by discarding nonalphanumeric characters and lowercasing. For instance, the full string
<italic>human Esr-1 subunit</italic>
is canonicalized into
<italic>esr1</italic>
. The purpose of this canonicalization is to abstract away from minor spelling variants and to deal with the fact that the BANNER named entity recognizer often includes a wider context around the core gene symbol. The canonicalization algorithm itself cannot, however, deal with the ambiguity prevalent among the symbols. EVEX thus further resolves these canonical gene symbols, whenever possible, into their most likely families, using two distinct resources for defining homologous genes and gene families:
<italic>HomoloGene</italic>
(eukaryots, [
<xref ref-type="bibr" rid="B24">14</xref>
]) and
<italic>Ensembl</italic>
(vertebrates, [
<xref ref-type="bibr" rid="B8">15</xref>
]). As part of this study, we extended EVEX to also include families from
<italic>Ensembl Genomes</italic>
, which provides coverage for metazoa, plants, protists, fungi, and bacteria [
<xref ref-type="bibr" rid="B13">16</xref>
]. Building on top of these definitions, the EVEX dataset now defines four
<italic>event generalizations</italic>
, whereby all events whose arguments have the same canonical form, or resolve to the same gene family, are aggregated. As a result, it becomes straightforward to retrieve all information on a specific gene symbol, abstracting away from lexical variants through the canonicalization algorithm, or to additionally apply the synonym-expansion module through the family-based generalizations. These different generalizations are all implemented on the web application (
<xref ref-type="sec" rid="sec4.2">Section 4.2</xref>
).</p>
</sec>
</sec>
<sec sec-type="subsection" id="sec2.2">
<title>2.2. Event Ranking</title>
<p> To rank the extracted events according to their reliability, we have implemented an event scoring algorithm based on the output of the Turku Event Extraction System. This machine learning system uses linear Support Vector Machines (SVMs) as the underlying classifier [
<xref ref-type="bibr" rid="B7">17</xref>
]. Every classification is given a confidence score, the distance to the decision hyperplane of the linear classifier, where higher scores are associated with more confident decisions. There is not a single master classifier to predict the events in their entirety. Rather, individual classifications are made to predict the event trigger and each of its arguments. In order to assign a single confidence score to a specific event occurrence, the predictions from these two separate classifiers must be aggregated.</p>
<p>The confidence scores of the two different classifiers are not directly mutually comparable, and we therefore first normalize all scores in the dataset to zero mean and unit standard deviation, separately for triggers and arguments. Subsequently, the score of a specific event occurrence is assigned to be the
<italic>minimum</italic>
of the normalized scores of its event trigger and its arguments, that is, the lowest normalized confidence among all classification decisions involved in extracting that specific event. Using minimum as the aggregation function roughly corresponds to the
<italic>fuzzy and</italic>
operator in that it requires all components of an event to be confident for it to be ranked high. Finally, the score of a generalized event is the average of the scores of all its occurrences.</p>
<p>To assign a meaningful interpretation to the normalized and aggregated confidence values, events within the top 20% of the confidence range are classified as “very high confidence.” The other 4 categories, each representing the next 20% of all events, are respectively labeled as “high confidence,” “average confidence,” “low confidence” and “very low confidence.” When presenting multiple possible hits for a certain query, the web application uses the original scores to rank the events from high to low reliability.</p>
</sec>
<sec sec-type="subsection" id="sec2.3">
<title>2.3. Event Refinement</title>
<p> The extraction of event structures is highly dependent on the lexical and syntactic constructs used in the sentence and may therefore contain unnecessary complexity. This is because the event extraction system is trained to closely follow the actual statements in the sentence and thus, for instance, will mark both of the words
<italic>increase</italic>
and
<italic>induces</italic>
as triggers for positive regulation events in the sentence
<italic>Ang II induces a rapid increase in MAPK activity</italic>
. Consequently, the final event structure is extracted as
<italic>Positive-Regulation(C: Ang II, T: Positive-Regulation(T: MAPK))</italic>
, that is,
<italic>Ang II</italic>
is a Cause argument of a positive regulation event, which has another positive regulation event as its Theme.</p>
<p>While correctly extracted, such nested single-argument regulatory events (i.e., regulations with a Theme but no Cause argument), often forming chains that are several events long, are unnecessarily complex. Clearly, the event above can be restated as
<italic>Positive-Regulation(C: Ang II, T: MAPK)</italic>
, removing the nested single-argument positive regulation event. This refinement helps to establish the event as equivalent with all other events that can be refined to the same elementary structure, enhancing the event aggregation possibilities in EVEX. However, when presenting the details of the extracted event to the user, the original structure of the event is preserved.</p>
<p>
<xref ref-type="table" rid="tab1">Table 1</xref>
lists the set of refinement rules. In this context, positive and negative regulation refer to having a general positive or negative effect, while an unspecified regulation could not be resolved to either category due to missing information in the sentence.</p>
<p>To simplify the single-argument regulatory events, we proceed iteratively, removing intermediary single-argument regulatory events as long as any rule matches. A particular consideration is given to the polarity of the regulations. While a nested chain of single-argument positive regulations can be safely reduced to a single positive regulation, the outcome of reducing chains of single-argument regulations of mixed polarity is less obvious. As illustrated in
<xref ref-type="table" rid="tab1">Table 1</xref>
, application of the rules may result in a change of polarity of the outer event. For instance, a regulation of a negative regulation is interpreted as a negative regulation, changing the polarity of the outer event from unspecified to negative. To avoid excessive inferences not licensed by the text, the algorithm only allows one such change of polarity. Any subsequent removal of a nested single-argument regulatory event that results in a type change forces the new type of the outer event to be of the unspecified regulation type.</p>
</sec>
<sec sec-type="subsection" id="sec2.4">
<title>2.4. Pairwise Abstraction</title>
<p>The most basic query issued on the EVEX web application involves a single gene, which triggers the generation of a structured overview page, listing associated genes grouped by their type of connection with the query gene (
<xref ref-type="sec" rid="sec4.1">Section 4.1</xref>
). The most important underlying functionality implemented by the web application is thus the ability to identify and categorize pairs of related genes. This pairwise point of view comes natural in the life sciences and can be implemented on top of the events with ease by analyzing common event structures and defining argument pairs within. The refinements discussed in
<xref ref-type="sec" rid="sec2.3">Section 2.3</xref>
substantially decrease the number of unique event structures present in the data, restricting the required analysis to a comparatively small number of event structures. Furthermore, we only need to consider those events that involve more than one gene or that are a recursive argument in such an event, limiting the set of event occurrences from 21 M to 12 M events.</p>
<p>As an example, let us consider the event
<italic>Positive-Regulation(C:Thrombin, T:Positive-Regulatio(C:EGF, Phosphorylation(T:Akt)))</italic>
, extracted from the sentence
<italic>Thrombin augmented EGF-stimulated Akt phosphorylation</italic>
. The pairs of interest here are
<italic>Thrombin—Akt</italic>
and
<italic>EGF—Akt</italic>
, both associations coarsely categorized as
<italic>regulation</italic>
. Therefore, whenever a user queries for
<italic>Thrombin</italic>
, the
<italic>Akt</italic>
gene will be listed among the regulation targets, and, whenever a user queries for
<italic>Akt</italic>
, both
<italic>Thrombin</italic>
and
<italic>EGF</italic>
will be listed as regulators. Note, however, that the categorization of the association as
<italic>regulation</italic>
is only for the purpose of coarse grouping of the results on the overview page. The user will additionally be presented with the details of the original event, which is translated from the bracketed notation into the English statement
<italic>Upregulation of AKT phosphorylation by EGF is upregulated by Thrombin</italic>
.</p>
<p>There is a limited number of prevalent event structures which account for the vast majority of event occurrences.
<xref ref-type="table" rid="tab2">Table 2</xref>
lists the most common structures, together with the gene pairs extracted from them. The algorithm to extract the gene pairs from the event structures proceeds as follows.</p>
<list list-type="order">
<list-item>
<p>All argument pairs are considered a candidate and classified as
<italic>binding</italic>
if both participants are a Theme of one specific binding event, and
<italic>regulation</italic>
otherwise. (Note that due to the restrictions of event arguments as described in
<xref ref-type="sec" rid="sec2.1">Section 2.1</xref>
, only binding and regulation events can have more than one argument.)</p>
</list-item>
<list-item>
<p>If one of the genes is a Theme argument of an event which itself is a Cause argument, for example,
<italic>G2</italic>
in
<italic>Regulation(C:Regulation(C:G1, T:G2), T:G3)</italic>
, the association type of the candidate pair
<italic>G2-G3</italic>
is reclassified as
<italic>indirect regulation</italic>
, since the direct regulator of
<italic>G3</italic>
is the Cause argument of the nested regulation (
<italic>G1</italic>
).</p>
</list-item>
<list-item>
<p>If one of the genes is a Cause argument of an event which itself is a Theme argument, for example,
<italic>G2</italic>
in
<italic>Regulation(C:G1, T:Regulation(C:G2, T:G3))</italic>
, the candidate pair (
<italic>G1-G2</italic>
) is discarded.</p>
</list-item>
</list>
<p>While the association between
<italic>G1</italic>
and
<italic>G2</italic>
is discarded in step (3) since it in many cases cannot convincingly be classified as a regulation, it is represented as a
<italic>coregulation</italic>
when indirect associations, described in the following section, are sought.</p>
</sec>
<sec sec-type="subsection" id="sec2.5">
<title>2.5. Indirect Associations</title>
<p> A cell's activity is often organized into regulatory modules, that is, sets of coregulated genes that share a common function. Such modules can be found by automated analysis and clustering of genome-wide expression profiles [
<xref ref-type="bibr" rid="B25">18</xref>
]. Individual events, as defined by the BioNLP Shared Tasks, do not explicitly express such associations. However, indirect regulatory associations can be identified by combining the information expressed in various events retrieved across different articles. For instance, the events
<italic>Regulation(C:geneA, T:geneZ)</italic>
and
<italic>Regulation(C:geneB, T:geneZ)</italic>
can be aggregated to present the hypothesis that
<italic>geneA</italic>
and
<italic>geneB</italic>
coregulate
<italic>geneZ</italic>
. Such hypothesis generation is greatly simplified by the fact that the events have been refined using the procedure described in
<xref ref-type="sec" rid="sec2.3">Section 2.3</xref>
and the usage of a relational database, which allows efficient querying across events.</p>
<p>The indirect associations as implemented for the web application include coregulation and common binding partners (
<xref ref-type="table" rid="tab3">Table 3</xref>
). These links have been precalculated and stored in the database, enabling fast retrieval of, for example, coregulators or genes that are targeted by a common regulator, facilitating the discovery of functional modules through text mining information. However, it needs to be stated that these associations are mainly hypothetical, as, for example, coregulators additionally require coexpression. Details on gene expression events can be found by browsing the sentences of specific genes as described in
<xref ref-type="sec" rid="sec4.1">Section 4.1</xref>
.</p>
</sec>
</sec>
<sec id="sec3">
<title>3. Results and Performance Evaluation</title>
<p>In this section, we present the evaluation of the EVEX resource from several points of view. First, we discuss the performance of the event extraction system used to produce the core set of events in EVEX, reviewing a number of published evaluations both within the BioNLP Shared Task and in other domains. Second, we present several evaluations of the methods and data employed specifically in the EVEX resource in addition to the core event predictions: we review existing results as well as present new evaluations of the confidence scores and their correlation with event precision, the family-based generalization algorithms, and the novel event refinement algorithms introduced above. Finally, we discuss two biologically motivated applications of EVEX, demonstrating the usability of EVEX in real-world use cases.</p>
<sec sec-type="subsection" id="sec3.1">
<title>3.1. Core Event Predictions</title>
<p>The Turku Event Extraction System (TEES), the source of the core set of EVEX events, was extensively evaluated on the BioNLP Shared Tasks. It was the winning system of the ST'09, achieving 46.73% recall, 58.48% precision, and 51.95%
<italic>F</italic>
-score [
<xref ref-type="bibr" rid="B3">9</xref>
]. In the current study, the original set of event predictions extracted from the PubMed 2009 distribution has been brought up to date using an improved version of TEES. This updated system was recently shown to achieve state-of-the-art results in the ST'11, obtaining 50.06% recall, 59.48% precision, and 54.37%
<italic>F</italic>
-score on the corresponding abstract-only GENIA subchallenge [
<xref ref-type="bibr" rid="B5">12</xref>
].</p>
<p>To assess the generalizability of the text mining results from domain-specific datasets to the whole of PubMed, a precision rate of 64% was previously obtained by manual evaluation of 100 random events [
<xref ref-type="bibr" rid="B4">19</xref>
]. In the same study, the named entities (i.e., gene and protein symbols) as extracted by BANNER were estimated to achieve a precision of 87%. These figures indicate that the performance of the various text mining components generalize well from domain-specific training data to the entire PubMed.</p>
</sec>
<sec sec-type="subsection" id="sec3.2">
<title>3.2. Confidence Values</title>
<p>To investigate the correlation of the confidence values (
<xref ref-type="sec" rid="sec2.2">Section 2.2</xref>
) to the correctness of the extracted events, we have measured the precision and recall rates of binding events between two genes, simulating a use case that involves finding related binding partners for a certain query gene (
<xref ref-type="sec" rid="sec4.1">Section 4.1</xref>
). This experiment was conducted on the ST'09 development set, consisting of 150 PubMed abstracts with 94 gold-standard binding pairs. For this dataset, 67 interacting pairs were found in EVEX, with confidence values ranging between −1.7 and 1.3. When evaluated against the gold-standard data, the whole set of predictions achieves 59.7% precision and 42.6% recall.</p>
<p>Using the confidence values for ranking, we have subsequently applied a cut-off threshold on the results, only keeping predictions with confidence values above the threshold. A systematic screening was performed between the interval of −1.7 and 1.3, using a step-size of 0.05 (60 evaluations). The results have been aggregated and summarized in
<xref ref-type="fig" rid="fig2">Figure 2</xref>
, depicting the average precision and recall values for each aggregated interval of 0.6 length. For example, a cut-off value between 0.10 and 0.70 (fourth interval) would result in an average precision rate of 70.0% and recall of 14.4%. Only taking the top ranked predictions, with a threshold above 0.7 (fifth interval), results in extremely high precision (91.9%) but only 4.8% recall. On the scale of EVEX, however, 4.8% recall would still translate to more than a million high-precision events.</p>
</sec>
<sec sec-type="subsection" id="sec3.3">
<title>3.3. EVEX Generalizations</title>
<p>As described in
<xref ref-type="sec" rid="sec2.1">Section 2.1</xref>
, the EVEX resource provides several algorithms to generalize gene symbols and their events, providing the opportunity to identify and aggregate equivalent events across various articles, accounting for lexical variants and synonymy. In a first step, a canonical form of the gene symbols is produced, increasing the proportion of symbols that can be matched to gene databases. This algorithm has previously been evaluated on the ST'09 training set, which specifically aims at identifying entities that are likely to match gene and protein symbol databases. By canonicalizing the symbols as predicted by BANNER, an increase of 11 percentage points in
<italic>F</italic>
-score was obtained [
<xref ref-type="bibr" rid="B28">10</xref>
].</p>
<p>The family-based generalizations have also been previously evaluated for both HomoloGene and Ensembl definitions. To expand the coverage of these generalizations, in this study, we have added definitions from Ensembl Genomes. The statistics on coverage of gene symbols, brought up to date by including the 2009–2011 abstracts, are depicted in
<xref ref-type="table" rid="tab4">Table 4</xref>
. While only a small fraction of all unique canonical symbols matches the gene families from HomoloGene or Ensembl (Genomes) (between 3 and 6%), this small fraction accounts for more than half of all occurrences (between 51 and 61%). The family disambiguation algorithm thus discards a long tail of very infrequent canonical symbols. These findings are similar to the previous statistics presented by Van Landeghem et al. [
<xref ref-type="bibr" rid="B28">10</xref>
]. Additionally, the newly introduced families of Ensembl Genomes clearly provide a higher coverage: 8-9 percentage points higher than HomoloGene or Ensembl.</p>
</sec>
<sec sec-type="subsection" id="sec3.4">
<title>3.4. Event Refinement</title>
<p>By removing the chains of single-argument regulatory events, the refinement process simplifies and greatly reduces the heterogeneity in event structures, facilitating semantic interpretation and search for similar events. This process reduces the number of distinct event structures by more than 60%.</p>
<p>The main purpose of the event refinement algorithm, in combination with the pairwise view of the events, is to increase the coverage of finding related genes for a certain input query gene. When applying the algorithm as detailed in
<xref ref-type="sec" rid="sec2.3">Section 2.3</xref>
, the number of events with more than one gene symbol as direct argument increases from 1471 K to 1588 K, successfully generating more than a hundred thousand simplified events that can straightforwardly be parsed for pairwise relations.</p>
<p>It has to be noted, however, that the results of the refinement algorithm are merely used as an abstract layer to group similar events together and to offer quick access to relevant information. The original event structures as extracted by TEES are always presented to the user when detailed information is requested, allowing the user to reject or accept the inferences made by the refinement algorithm.</p>
</sec>
<sec sec-type="subsection" id="sec3.5">
<title>3.5. Biological Applications</title>
<p>The EVEX dataset and the associated web application have recently been applied in a focused study targeting the regulation of NADP(H) expression in
<italic>E. coli</italic>
, demonstrating the resource in a real-life biological use case, with encouraging results [
<xref ref-type="bibr" rid="B11">20</xref>
]. The Ensembl Genomes generalization was used to allow for homology-based inference, and the regulatory network extracted from EVEX was integrated with microarray coexpression data. As part of this study, 461 occurrences of two-argument events in the NADP(H) regulatory network were manually evaluated, with precision of 53%. This figure compares favorably with the BioNLP'09 Shared Task official evaluation results of 50% for binding events and 46% for regulation events, the only event types that allow more than one argument. The event occurrences that were judged to be correctly extracted were further evaluated for the correctness of the assignment of their arguments to Ensembl Genomes families: 72% of event occurrences had both of their arguments assigned to the correct family.</p>
<p>In a separate study, the suitability of the EVEX dataset and web application to the task of pathway curation was analyzed with a particular focus on recall [
<xref ref-type="bibr" rid="B20">21</xref>
]. When analysing three high-quality pathway models, TLR, mTOR and yeast cell cycle, 60% of all interactions could be retrieved from EVEX using the canonical generalization. A thorough manual evaluation further suggested that, surprisingly, the most common reason for a pathway interaction not being extracted is not a failure of the event extraction pipeline, but rather a lack of semantic coverage. In these cases, the interaction corresponds to an event type not defined in the ST'09 task and thus out of scope for the event extraction system. Only 11% of interactions in the evaluated pathways were not recovered due to a failure of the event extraction system. This result shows that the recall in EVEX, at least in the pathways under evaluation by Ohta et al., is clearly above the recall value published for the event extraction system in isolation. This increase can very likely be attributed to the volume of the event data in EVEX and the ability to aggregate several event occurrences into a single generalized event, where the failure to extract an individual event occurrence does not automatically mean the failure to extract the generalized event.</p>
</sec>
</sec>
<sec id="sec4">
<title>4. Web Application</title>
<p>To illustrate the functionality and features of the web application, we present a use case on a specific budding yeast gene,
<italic>Mec1</italic>
, which is conserved in
<italic>S. pombe</italic>
,
<italic>S. cerevisiae</italic>
,
<italic>K. lactis</italic>
,
<italic>E. gossypii</italic>
,
<italic>M. grisea,</italic>
and
<italic>N. crassa</italic>
.
<italic>Mec1</italic>
is required for meiosis and plays a critical role in the maintenance of genome stability. Furthermore, it is considered to be a homolog of the mammalian
<italic>ATR</italic>
/
<italic>ATM</italic>
, a signal transduction protein [
<xref ref-type="bibr" rid="B6">22</xref>
].</p>
<sec sec-type="subsection" id="sec4.1">
<title>4.1. Gene Overview</title>
<p> The main functionality of the EVEX resource is providing fast access to relevant information and related biomolecular entities of a gene or pair of genes of interest. (Analysis of large gene lists is currently not supported, as such a bioinformatics use case is already covered by the publicly available MySQL database.) The most straightforward way to achieve this is through the canonical generalization, searching for a gene symbol or a pair of genes separated by a comma.</p>
<p>When typing the first characters of a gene symbol, a list of candidate matches is proposed, guiding the user to likely gene symbols found in text. The search page then automatically generates a listing of relevant biomolecular events, grouped by event type. At the top of the page, an overview of all regulators, regulated genes, and binding partners is provided, each accompanied with an example sentence. Further, coregulators are listed together with the number of coregulated genes (
<xref ref-type="sec" rid="sec2.5">Section 2.5</xref>
).
<xref ref-type="fig" rid="fig3"> Figure 3</xref>
shows the results when searching for
<italic>Mec1</italic>
. This overview lists 21 regulation targets, 11 regulators, 27 binding partners, and 263 coregulators. Within each category, the events are ranked by confidence, ranging from (very) high to average and (very) low (
<xref ref-type="sec" rid="sec2.2">Section 2.2</xref>
). Further, example sentences are always chosen to be those associated with the highest confidence score.</p>
<p>Selecting the target
<italic>RAD9</italic>
, the web application visualises all event structures expressing regulation of
<italic>RAD9</italic>
by
<italic>Mec1</italic>
(
<xref ref-type="fig" rid="fig4">Figure 4</xref>
). This enables a quick overview of the mechanisms through which the regulation is established, which can have a certain polarity (positive/negative) and may involve physical events such as phosphorylation or protein-DNA binding. The different types of event structures are coarsely grouped into categories of similar events and presented from most to least reliable using the confidence scores.</p>
<p>Exploring the relationship between
<italic>RAD9</italic>
and
<italic>Mec1</italic>
further, EVEX enables a search of all events linking these two genes through any direct or indirect association (
<xref ref-type="fig" rid="fig5">Figure 5</xref>
). This page provides conclusive evidence for a binding event between
<italic>RAD9</italic>
and
<italic>Mec1</italic>
. Further, both a
<italic>Mec1 regulates RAD9</italic>
and a
<italic>RAD9 regulates Mec1</italic>
event are presented. However, inspecting the sentences, the first one is obviously the only correct one. This illustrates the opportunity to use the large-scale event extraction results for pruning false positives of the text mining algorithm, as the false result only has 1 piece of evidence, and with a “very low” confidence, while the correct regulation is supported by 3 different evidence excerpts, two of which are of “high” confidence, and is thus displayed first.</p>
<p>Apart from the regulatory and binding mechanisms, the overview page also lists potential coregulations, enumerating targets that are regulated by both genes, such as
<italic>Rad53</italic>
. When accessing the details for this hypothesis, all evidence excerpts supporting both regulations are presented. Other indirect associations, such as common regulators and binding partners, can be retrieved equally fast.</p>
<p>Finally, the overview page of
<italic>Mec1</italic>
(
<xref ref-type="fig" rid="fig3">Figure 3</xref>
) contains additional relevant information including links to sentences stating events of
<italic>Mec1</italic>
without a second argument, grouped by event type. While these events incorporate only a single gene or protein and may not be very informative by themselves, they are highly relevant for information retrieval purposes, finding interesting sentences and articles describing specific processes such as protein catabolism or phosphorylation.</p>
<p>At the bottom of the overview page, a similar and even more general set of sentences can be found, providing pointers to relevant literature while still requiring manual analysis to determine the exact type of information. Such sentences, even though they contain no extracted events, may include useful background information on the gene such as relevant experimental studies, related diseases, or general functions and pathways.</p>
</sec>
<sec sec-type="subsection" id="sec4.2">
<title>4.2. Homology-Based Inference</title>
<p> In comparative genomics, it is common practice to transfer functional annotations between related organisms for genes sharing sequence similarity [
<xref ref-type="bibr" rid="B17">23</xref>
,
<xref ref-type="bibr" rid="B21">24</xref>
]. The EVEX resource provides such functionality for inferring interactions and other biomolecular events based on homology, by summarizing all events pertaining to a certain family when searching for one of its members (
<xref ref-type="sec" rid="sec2.1">Section 2.1</xref>
).</p>
<p>For example, instead of only looking at the information for one particular gene symbol as described previously, we can extend the search through Ensembl Genomes and retrieve information on homologous genes and their synonyms. The generated listings of regulators and binding partners are structured in exactly the same way as before, but this time each symbol refers to a whole gene family rather than just one gene name.</p>
<p>Conducting such a generalized search for
<italic>Mec1</italic>
, EVEX retrieves interaction information for
<italic>Mec1</italic>
and its homologs. The resulting page presents not only results for the symbol
<italic>Mec1</italic>
, but also for common symbols which are considered synonyms on the gene-family level, such as
<italic>ATR</italic>
. This type of synonym expansion goes well beyond a simple keyword query.</p>
<p>For each gene family present in the text mining data, a family profile lists all genes and synonyms for a specific family, linking to the authoritative resources such as Entrez Gene and the Taxonomy database at NCBI. While
<italic>ESR1</italic>
is a known but deprecated synonym of
<italic>Mec1</italic>
[
<xref ref-type="bibr" rid="B12">25</xref>
], it is not considered as a viable synonym of
<italic>Mec1</italic>
, considering
<italic>Esr1</italic>
generally refers to the family of estrogen receptors. The synonym disambiguation algorithm of Van Landeghem et al. [
<xref ref-type="bibr" rid="B28">10</xref>
], which is the basis of the gene family generalizations, will thus prevent
<italic>Esr1</italic>
from being used as a synonym for
<italic>Mec1</italic>
. Reliable synonyms found in text do however include
<italic>ATR</italic>
and
<italic>SCKL</italic>
.</p>
<p>The EVEX web application includes several distinct methods of defining gene families (
<xref ref-type="sec" rid="sec2.1">Section 2.1</xref>
), each accommodating for specific organisms and use cases. For example, Ensembl Genomes defines rather coarse grained families resulting in a family of 19 evolutionarily conserved genes, including the budding yeast gene
<italic>Mec1</italic>
, its mammalian
<italic>ATR</italic>
orthologs, and genes from green algae and Arabidopsis. In contrast, the corresponding family defined by HomoloGene only includes the 6 conserved
<italic>Mec1</italic>
genes in the Ascomycota.</p>
</sec>
<sec sec-type="subsection" id="sec4.3">
<title>4.3. Manual Inspection of Text Mining Results</title>
<p>An important aspect of the EVEX web application is the ability to retrieve the original sentences and articles for all claims extracted from literature. In the previous sections, we have described how EVEX can assist in the retrieval of directly and indirectly associated genes and proteins by generating summary overviews. However, to be applicable in real-life use cases and to be valuable to a domain expert, it is necessary to distinguish trustworthy predictions from unreliable hypotheses. For this reason, automatically generated confidence values are displayed for each extracted interaction, ranging from (very) high to average and (very) low. On top of those, the site always provides the opportunity to inspect the textual evidence in detail.</p>
<p>Consider, for example, the phosphorylation of
<italic>RAD9</italic>
, regulated by
<italic>Mec1</italic>
(
<xref ref-type="fig" rid="fig4">Figure 4</xref>
). To allow a detailed inspection of this event, the web application integrates the
<italic>stav</italic>
visualiser [
<xref ref-type="bibr" rid="B26">26</xref>
], which was developed as a supporting resource for the ST'11 [
<xref ref-type="bibr" rid="B15">2</xref>
] (
<xref ref-type="fig" rid="fig6">Figure 6</xref>
). This open-source tool provides a detailed and easily graspable presentation of the event structures and the associated textual spans. To any user interested in the text mining details, this visualization provides valuable insights into the automated event extraction process. Additionally, the web application provides the opportunity to visualise whole PubMed abstracts with the
<italic>stav</italic>
visualiser, allowing a fast overview of event information contained within an abstract.</p>
</sec>
<sec sec-type="subsection" id="sec4.4">
<title>4.4. Site Navigation</title>
<p> To easily trace back previously found results, a session-based search history at the righthand side of the screen provides links to the latest searches issued on the site. Further, a box with related searches suggests relevant queries related to the current page. Finally, the web application provides a powerful method to browse indirectly associated information, by allowing the retrieval of nested and parent interactions of a specific event. For example, when accessing the details of
<italic>Mec1</italic>
's regulation of
<italic>RAD9</italic>
phosphorylation and selecting the phosphorylation event, evidence is shown for many parent events involving different regulation polarities and various genes causing this specific phosphorylation. As such, we quickly learn that
<italic>RAD9</italic>
phosphorylation has many different potential regulators, such as
<italic>Ad5</italic>
,
<italic>Ad12,</italic>
and
<italic>C-Abl</italic>
. This sort of explorative information retrieval and cross-article discovery is exactly the type of usage aimed at by the EVEX resource.</p>
</sec>
</sec>
<sec id="sec5">
<title>5. Conclusions and Future Work</title>
<p>This paper presents a publicly available web application providing access to over 21 million detailed events among more than 40 million identified gene/protein symbols in nearly 6 million PubMed titles and abstracts. This dataset is the result of processing the entire collection of PubMed titles and abstracts through a state-of-the-art event extraction system and is regularly updated as new citations are added to PubMed. The extracted events provide a detailed representation of the textual statements, allowing for recursively nested events and different event types ranging from phosphorylation to catabolism and regulation. The EVEX web application is the first publicly released resource that provides intuitive access to these detailed event-based text mining results.</p>
<p>As the application mainly targets manual explorative browsing for supporting research in the life sciences, several steps are taken to allow for efficient querying of the large-scale event dataset. First, events are assigned confidence scores and ranked according to their reliability. Further, the events are refined to unify different event structures that have a nearly identical interpretation. Additionally, the events are aggregated across articles, accounting for lexical variation and generalizing gene symbols with respect to their gene family. This aggregation allows for efficient access to relevant information across articles and species. Finally, the EVEX web application groups events with respect to the involvement of pairs of genes, providing the users with a familiar gene-centric point of view, without sacrificing the expressiveness of the events. This interpretation is extended also to combinations of events, identifying indirect associations such as common coregulators and common binding partners, as a form of literature-based hypothesis generation.</p>
<p>There are a number of future directions that can be followed in order to extend and further improve the EVEX web application. The core set of events can be expanded by also processing all full-text articles from the open-access section of PubMed Central. Further, as BioNLP methods keep evolving towards more detailed and accurate predictions, the dataset can be enriched with new information, for example, by including epigenetics data as recently introduced by the BioNLP'11 Shared Task [
<xref ref-type="bibr" rid="B15">2</xref>
,
<xref ref-type="bibr" rid="B2">27</xref>
] and integrating noncausal entity relations [
<xref ref-type="bibr" rid="B22">28</xref>
,
<xref ref-type="bibr" rid="B29">29</xref>
]. Additionally, gene normalization data can be incorporated, enabling queries using specific gene or protein identifiers [
<xref ref-type="bibr" rid="B18">30</xref>
]. Finally, a web service may be developed to allow programmatic access to the EVEX web application, allowing bulk queries and result export for further postprocessing in various bioinformatics applications.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p> S. Van Landeghem would like to thank the Research Foundation Flanders (FWO) for funding her research and a travel grant to Turku. Y. Van de Peer wants to acknowledge support from Ghent University (Multidisciplinary Research Partnership Bioinformatics: from nucleotides to networks) and the Interuniversity Attraction Poles Programme (IUAP P6/25), initiated by the Belgian State, Science Policy Office (BioMaGNet). This work was partly funded by the Academy of Finland, and the computational resources were provided by CSC-IT Center for Science Ltd., Espoo, Finland and the Department of IT, University of Turku, Finland.</p>
</ack>
<ref-list>
<ref id="B14">
<label>1</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>J-D</given-names>
</name>
<name>
<surname>Ohta</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kano</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tsujii</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Overview of BioNLP'09 shared task on event extraction</article-title>
<conf-name>In: Proceedings of the BioNLP Workshop Companion Volume for Shared Task</conf-name>
<conf-date>2009</conf-date>
<publisher-name>Association for Computational Linguistics</publisher-name>
<fpage>1</fpage>
<lpage>9</lpage>
</element-citation>
</ref>
<ref id="B15">
<label>2</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>J-D</given-names>
</name>
<name>
<surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ohta</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bossy, N. Nguyen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Tsujii</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Overview of BioNLP shared task 2011</article-title>
<conf-name>In: Proceedings of the BioNLP Workshop Companion Volume for Shared Task</conf-name>
<conf-date>2011</conf-date>
<publisher-name>Association for Computational Linguistics</publisher-name>
<fpage>1</fpage>
<lpage>6</lpage>
</element-citation>
</ref>
<ref id="B10">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Homann</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Valencia</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>A gene network for navigating the literature</article-title>
<source>
<italic>Nature Genetics</italic>
</source>
<year>2004</year>
<volume>36</volume>
<issue>7, aricle 664</issue>
</element-citation>
</ref>
<ref id="B19">
<label>4</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Ohta</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Miyao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Ninomiya</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<article-title>An intelligent search engine and GUI-based efficient MEDLINE search tool based on deep syntactic parsing</article-title>
<conf-name>In: Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions</conf-name>
<conf-date>2006</conf-date>
<publisher-name>Association for Computational Linguistics</publisher-name>
<fpage>17</fpage>
<lpage>20</lpage>
</element-citation>
</ref>
<ref id="B23">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rebholz-Schuhmann</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kirsch</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Arregui</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gaudan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Riethoven</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Stoehr</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>EBIMed—text crunching to gather facts for proteins from Medline</article-title>
<source>
<italic>Bioinformatics</italic>
</source>
<year>2007</year>
<volume>23</volume>
<issue>2</issue>
<fpage>e237</fpage>
<lpage>e244</lpage>
<pub-id pub-id-type="pmid">17237098</pub-id>
</element-citation>
</ref>
<ref id="B9">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hearst</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Divoli</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Guturu</surname>
<given-names>HH</given-names>
</name>
<etal></etal>
</person-group>
<article-title>BioText search engine: beyond abstract search</article-title>
<source>
<italic>Bioinformatics</italic>
</source>
<year>2007</year>
<volume>23</volume>
<issue>16</issue>
<fpage>2196</fpage>
<lpage>2197</lpage>
<pub-id pub-id-type="pmid">17545178</pub-id>
</element-citation>
</ref>
<ref id="B30">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>McCusker</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Krauthammer</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Yale Image Finder (YIF): a new search engine for retrieving biomedical images</article-title>
<source>
<italic>Bioinformatics</italic>
</source>
<year>2008</year>
<volume>24</volume>
<issue>17</issue>
<fpage>1968</fpage>
<lpage>1970</lpage>
<pub-id pub-id-type="pmid">18614584</pub-id>
</element-citation>
</ref>
<ref id="B1">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Agarwal</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kohane</surname>
<given-names>I</given-names>
</name>
</person-group>
<article-title>BioNOT: a searchable database of biomedical negated sentences</article-title>
<source>
<italic>BMC Bioinformatics</italic>
</source>
<year>2011</year>
<volume>12</volume>
<comment>Article ID 420.</comment>
</element-citation>
</ref>
<ref id="B3">
<label>9</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Björne</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ginter</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tsujii</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Salakoski</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Scaling up biomedical event extraction to the entire PubMed</article-title>
<conf-name>In: Proceedings of the BioNLP Workshop Companion Volume for Shared Task</conf-name>
<conf-date>2010</conf-date>
<publisher-name>Association for Computational Linguistics</publisher-name>
<fpage>28</fpage>
<lpage>36</lpage>
</element-citation>
</ref>
<ref id="B28">
<label>10</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Van Landeghem</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ginter</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Salakoski</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>EVEX: a PubMed-scale resource for homology-based generalization of text mining predictions</article-title>
<conf-name>In: Proceedings of the BioNLP Workshop Companion Volume for Shared Task</conf-name>
<conf-date>2011</conf-date>
<publisher-name>Association for Computational Linguistics</publisher-name>
<fpage>28</fpage>
<lpage>37</lpage>
</element-citation>
</ref>
<ref id="B16">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Leaman</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gonzalez</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>BANNER: an executable survey of advances in biomedical named entity recognition</article-title>
<source>
<italic>Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing</italic>
</source>
<year>2011</year>
<fpage>652</fpage>
<lpage>663</lpage>
</element-citation>
</ref>
<ref id="B5">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Björne</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ginter</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Salakoski</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Generalizing biomedical event extraction</article-title>
<source>
<italic>BMC Bioinformatics</italic>
</source>
<year>2012</year>
<volume>13</volume>
<issue>supplement 8, article S4</issue>
</element-citation>
</ref>
<ref id="B27">
<label>13</label>
<element-citation publication-type="journal">
<collab>The UniProt Consortium</collab>
<article-title>Ongoing and future developments at the universal protein resource</article-title>
<source>
<italic>Nucleic Acids Research</italic>
</source>
<year>2011</year>
<volume>39</volume>
<issue>supplement 1</issue>
<fpage>D214</fpage>
<lpage>D219</lpage>
<pub-id pub-id-type="pmid">21051339</pub-id>
</element-citation>
</ref>
<ref id="B24">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sayers</surname>
<given-names>EW</given-names>
</name>
<name>
<surname>Barrett</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Benson</surname>
<given-names>DA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Database resources of the National Center for Biotechnology Information</article-title>
<source>
<italic>Nucleic Acids Research</italic>
</source>
<year>2009</year>
<volume>38</volume>
<issue>supplement 1</issue>
<fpage>D5</fpage>
<lpage>D16</lpage>
<pub-id pub-id-type="pmid">19910364</pub-id>
</element-citation>
</ref>
<ref id="B8">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Flicek</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Amode</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Barrell</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Ensembl 2011</article-title>
<source>
<italic>Nucleic Acids Research</italic>
</source>
<year>2011</year>
<volume>39</volume>
<issue>1</issue>
<fpage>D800</fpage>
<lpage>D806</lpage>
<pub-id pub-id-type="pmid">21045057</pub-id>
</element-citation>
</ref>
<ref id="B13">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kersey</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Lawson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Ensembl genomes: extending ensembl across the taxonomic space</article-title>
<source>
<italic>Nucleic Acids Research</italic>
</source>
<year>2009</year>
<volume>38</volume>
<issue>supplement 1</issue>
<fpage>D563</fpage>
<lpage>D569</lpage>
<pub-id pub-id-type="pmid">19884133</pub-id>
</element-citation>
</ref>
<ref id="B7">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crammer</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Singer</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Ultraconservative online algorithms for multiclass problems</article-title>
<source>
<italic>Journal of Machine Learning Research</italic>
</source>
<year>2003</year>
<volume>3</volume>
<issue>4-5</issue>
<fpage>951</fpage>
<lpage>991</lpage>
</element-citation>
</ref>
<ref id="B25">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Segal</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Shapira</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Regev</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data</article-title>
<source>
<italic>Nature Genetics</italic>
</source>
<year>2003</year>
<volume>34</volume>
<issue>2</issue>
<fpage>166</fpage>
<lpage>176</lpage>
<pub-id pub-id-type="pmid">12740579</pub-id>
</element-citation>
</ref>
<ref id="B4">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Björne</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ginter</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tsujii</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Salakoski</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Complex event extraction at PubMed scale</article-title>
<source>
<italic>Bioinformatics</italic>
</source>
<year>2010</year>
<volume>26</volume>
<issue>12</issue>
<fpage>i382</fpage>
<lpage>i390</lpage>
<comment>Article ID btq180.</comment>
<pub-id pub-id-type="pmid">20529932</pub-id>
</element-citation>
</ref>
<ref id="B11">
<label>20</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Kaewphan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kreula</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Van Landeghem</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Ginter</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Integrating large-scale text mining and co-expression networks: targeting NADP(H) metabolism in
<italic>E. coli</italic>
with event extraction</article-title>
<conf-name>In: Proceedings of the 3rd Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM '12)</conf-name>
<conf-date>2012</conf-date>
</element-citation>
</ref>
<ref id="B20">
<label>21</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Ohta</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tsujii</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>From pathways to biomolecular events: opportunities and challenges</article-title>
<conf-name>In: Proceedings of the BioNLP Workshop Companion Volume for Shared Task</conf-name>
<conf-date>2011</conf-date>
<publisher-name>Association for Computational Linguistics</publisher-name>
<fpage>105</fpage>
<lpage>113</lpage>
</element-citation>
</ref>
<ref id="B6">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carballo</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Cha</surname>
<given-names>RS</given-names>
</name>
</person-group>
<article-title>Meiotic roles of Mec1, a budding yeast homolog of mammalian ATR/ATM</article-title>
<source>
<italic>Chromosome Research</italic>
</source>
<year>2007</year>
<volume>15</volume>
<issue>5</issue>
<fpage>539</fpage>
<lpage>550</lpage>
<pub-id pub-id-type="pmid">17674144</pub-id>
</element-citation>
</ref>
<ref id="B17">
<label>23</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Loewenstein</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Raimondo</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Redfern</surname>
<given-names>OC</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Protein function annotation by homology-based inference</article-title>
<source>
<italic>Genome Biology</italic>
</source>
<year>2009</year>
<volume>10</volume>
<issue>2, article 207</issue>
</element-citation>
</ref>
<ref id="B21">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Proost</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Van Bel</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sterck</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>PLAZA: a comparative genomics resource to study gene and genome evolution in plants</article-title>
<source>
<italic>Plant Cell</italic>
</source>
<year>2009</year>
<volume>21</volume>
<issue>12</issue>
<fpage>3718</fpage>
<lpage>3731</lpage>
<pub-id pub-id-type="pmid">20040540</pub-id>
</element-citation>
</ref>
<ref id="B12">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kato</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ogawa</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>An essential gene, ESR1, is required for mitotic cell growth, DNA repair and meiotic recombination in Saccharomyces cerevisiae</article-title>
<source>
<italic>Nucleic Acids Research</italic>
</source>
<year>1994</year>
<volume>22</volume>
<issue>15</issue>
<fpage>3104</fpage>
<lpage>3112</lpage>
<pub-id pub-id-type="pmid">8065923</pub-id>
</element-citation>
</ref>
<ref id="B26">
<label>26</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Stenetorp</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Topić</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ohta</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J-D</given-names>
</name>
<name>
<surname>Tsujii</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>BioNLP Shared Task 2011: supporting resources</article-title>
<conf-name>In: Proceedings of the BioNLP Workshop Companion Volume for Shared Task</conf-name>
<conf-date>2011</conf-date>
<conf-loc>Portland, Oregon, USA</conf-loc>
<fpage>112</fpage>
<lpage>120</lpage>
</element-citation>
</ref>
<ref id="B2">
<label>27</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Björne</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Salakoski</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Generalizing biomedical event extraction</article-title>
<conf-name>In: Proceedings of the BioNLP Workshop Companion Volume for Shared Task</conf-name>
<conf-date>2011</conf-date>
<publisher-name>Association for Computational Linguistics</publisher-name>
<fpage>183</fpage>
<lpage>191</lpage>
</element-citation>
</ref>
<ref id="B22">
<label>28</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ohta</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Tsujii</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Overview of the entity relations (REL) supporting task of BioNLP Shared Task 2011</article-title>
<conf-name>In: Proceedings of the BioNLP Workshop Companion Volume for Shared Task</conf-name>
<conf-date>2011</conf-date>
<publisher-name>Association for Computational Linguistics</publisher-name>
<fpage>83</fpage>
<lpage>88</lpage>
</element-citation>
</ref>
<ref id="B29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Van Landeghem</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Björne</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Abeel</surname>
<given-names>T</given-names>
</name>
<name>
<surname>De Baets</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Salakoski</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Semantically linking molecular entities in literature through entity relationships</article-title>
<source>
<italic>BMC Bioinformatics</italic>
</source>
<year>2012</year>
<volume>13</volume>
<issue>supplement 8, article S6</issue>
</element-citation>
</ref>
<ref id="B18">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Kao</surname>
<given-names>HY</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>CH</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The gene normalization task in BioCreative III</article-title>
<source>
<italic>BMC Bioinformatics</italic>
</source>
<year>2011</year>
<volume>12</volume>
<issue>supplement 8, article S2</issue>
</element-citation>
</ref>
</ref-list>
</back>
<floats-group>
<fig id="fig1" position="float">
<label>Figure 1</label>
<caption>
<p>Event representation of the statement
<italic>IL-2 acts by enhancing binding activity of NF-κB to p55</italic>
, illustrating recursive nesting of events where the (T)heme of the
<italic>positive regulation</italic>
event is the
<italic>binding</italic>
event. The (C)ause argument is the gene symbol
<italic>IL-2</italic>
(figure adapted from [
<xref ref-type="bibr" rid="B28">10</xref>
]).</p>
</caption>
<graphic xlink:href="ABI2012-582765.001"></graphic>
</fig>
<fig id="fig2" position="float">
<label>Figure 2</label>
<caption>
<p>Evaluation of predicted binding events, measured against the gold-standard data of the ST'09 development set. By sorting the events according to their confidence values, a tradeoff between precision and recall is obtained.</p>
</caption>
<graphic xlink:href="ABI2012-582765.002"></graphic>
</fig>
<fig id="fig3" position="float">
<label>Figure 3</label>
<caption>
<p>Search results for
<italic>Mec1</italic>
on the canonical generalization. An overview of directly associated genes is presented, grouped by event type. In the screenshot, only the box with regulation targets is shown, but the other event types may also be expanded. At the bottom, relevant links to additional sentences and articles are provided.</p>
</caption>
<graphic xlink:href="ABI2012-582765.003"></graphic>
</fig>
<fig id="fig4" position="float">
<label>Figure 4</label>
<caption>
<p>Detailed representation of all evidence supporting the regulation of RAD9 by Mec1. Regulatory mechanisms can have a certain polarity (positive/negative) and may involve physical events such as phosphorylation or protein-DNA binding.</p>
</caption>
<graphic xlink:href="ABI2012-582765.004"></graphic>
</fig>
<fig id="fig5" position="float">
<label>Figure 5</label>
<caption>
<p>All events linking
<italic>Mec1</italic>
and
<italic>RAD9</italic>
through either direct or indirect associations. In the screenshot, only the regulation boxes are shown in detail, but the other event types may also be expanded. This page enables a quick overview of the mechanisms through which two genes interact, while at the same time highlighting false positive text mining results which can be identified by comparing confidence values and the evidence found in the sentences.</p>
</caption>
<graphic xlink:href="ABI2012-582765.005"></graphic>
</fig>
<fig id="fig6" position="float">
<label>Figure 6</label>
<caption>
<p>Visualization of a specific event occurrence by the stav text annotation visualiser. Genes and gene products (“GGPs”) are marked, as well as the trigger words that refer to specific event types. Finally, arrows denote the roles of each argument in the event (e.g. Theme or Cause). This visualization corresponds to the formal bracketed format of the event:
<italic>Positive-regulation(C: Mec1, T:Phosphorylation(T:RAD9))</italic>
.</p>
</caption>
<graphic xlink:href="ABI2012-582765.006"></graphic>
</fig>
<table-wrap id="tab1" position="float">
<label>Table 1</label>
<caption>
<p>Listing of the refinement rules, involving any nested combination of the three types of regulation: positive regulation (Pos), negative regulation (Neg) and unspecified regulation (Reg). Each parent event has a regulatory (T)heme argument and an optional (C)ause. The nested regulations are all regulations without a Cause and their detailed structure is omitted for brevity. In full, the first structure would read
<italic>Pos(C:geneA, T:Pos(T:geneB))</italic>
which is rewritten to
<italic>Pos(C:geneA, T:geneB)</italic>
with
<italic>geneA</italic>
and
<italic>geneB</italic>
being any two genes. </p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"> Original</th>
<th align="center" rowspan="1" colspan="1"> Result</th>
<th align="center" rowspan="1" colspan="1"> Example</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Pos(C, T:Pos)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>Pos(C, T)</italic>
</td>
<td align="center" rowspan="1" colspan="1"> BRs induce accumulation of BZR1 protein</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Pos(C, T:Reg)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>Pos(C, T)</italic>
</td>
<td align="center" rowspan="1" colspan="1"> PKS5 mediates PM H +- ATPase regulation</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Reg(C, T:Pos)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>Pos(C, T)</italic>
</td>
<td align="center" rowspan="1" colspan="1"> CaM regulates activation of HSFs</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Neg(C, T:Neg)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>Pos(C, T)</italic>
</td>
<td align="center" rowspan="1" colspan="1"> E2 prevented downregulation of p21 </td>
</tr>
<tr>
<td align="left" colspan="3" rowspan="1">
<hr></hr>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Reg(C, T:Reg)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>Reg(C, T)</italic>
</td>
<td align="center" rowspan="1" colspan="1"> PDK1 is involved in the regulation of S6K </td>
</tr>
<tr>
<td align="left" colspan="3" rowspan="1">
<hr></hr>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Neg(C, T:Reg)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>Neg(C, T)</italic>
</td>
<td align="center" rowspan="1" colspan="1"> GW5074 prevents this effect on ENT1 mRNA</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Neg(C, T:Pos)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>Neg(C, T)</italic>
</td>
<td align="center" rowspan="1" colspan="1"> BIN2 negatively regulates BZR1 accumulation</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Reg(C, T:Neg)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>Neg(C, T)</italic>
</td>
<td align="center" rowspan="1" colspan="1"> The effect of hCG in downregulating ER beta</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Pos(C, T:Neg)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>Neg(C, T)</italic>
</td>
<td align="center" rowspan="1" colspan="1"> DtRE is required for repression of CAB2</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab2" position="float">
<label>Table 2</label>
<caption>
<p>The most prevalent (refined) event patterns in the EVEX data, considering only events with more than one gene or protein symbol, and their recursively nested events. These aggregated patterns refer to any type of regulation (
<italic>*Reg</italic>
), to binding events between two genes (
<italic>Bind</italic>
), and to any physical event (
<italic>Phy</italic>
) concerning a single gene such as protein-DNA binding, protein catabolism, transcription, localization, phosphorylation, and gene expression. The first two columns refer to the percentage of event occurrences covered by the given pattern and the cumulative percentage of event occurrences up to and including the pattern. The right-most column depicts the extracted gene pair and a coarse classification of its association type.
<italic>A</italic>
and
<italic>B</italic>
refer to gene symbols, and bindings are represented with ×. Further,
<italic>A</italic>
>
<italic>B</italic>
means
<italic>A regulates B,</italic>
while
<italic>A</italic>
<italic>B</italic>
expresses an indirect regulation. </p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"> Occ. [%]</th>
<th align="center" rowspan="1" colspan="1"> Cum. occ. [%]</th>
<th align="center" rowspan="1" colspan="1"> Event pattern</th>
<th align="center" rowspan="1" colspan="1"> Gene pair</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1"> 58.6</td>
<td align="center" rowspan="1" colspan="1"> 58.6</td>
<td align="center" rowspan="1" colspan="1">
<italic>Phy(T:A)</italic>
</td>
<td align="center" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">15.0</td>
<td align="center" rowspan="1" colspan="1"> 73.6</td>
<td align="center" rowspan="1" colspan="1">
<italic>*Reg(T:A)</italic>
</td>
<td align="center" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> 8.4</td>
<td align="center" rowspan="1" colspan="1"> 82.0</td>
<td align="center" rowspan="1" colspan="1">
<italic>*Reg(T:Phy(T:A))</italic>
</td>
<td align="center" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> 8.0</td>
<td align="center" rowspan="1" colspan="1"> 90.0</td>
<td align="center" rowspan="1" colspan="1">
<italic>Bind(T:A, T:B)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>A</italic>
×
<italic>B</italic>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> 4.7</td>
<td align="center" rowspan="1" colspan="1"> 94.7</td>
<td align="center" rowspan="1" colspan="1">
<italic>*Reg(C:A, T:B)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>A</italic>
>
<italic>B</italic>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> 3.8</td>
<td align="center" rowspan="1" colspan="1"> 98.5</td>
<td align="center" rowspan="1" colspan="1">
<italic>*Reg(C:A, T:Phy(T:B))</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>A</italic>
>
<italic>B</italic>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> 0.2</td>
<td align="center" rowspan="1" colspan="1"> 98.7</td>
<td align="center" rowspan="1" colspan="1">
<italic>*Reg(C:*Reg(T:Phy(T:A)), T:Phy(T:B))</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>A</italic>
<italic>B</italic>
  </td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> 0.2</td>
<td align="center" rowspan="1" colspan="1"> 98.9</td>
<td align="center" rowspan="1" colspan="1">
<italic>*Reg(C:Phy(T:A), T:B)</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>A</italic>
<italic>B</italic>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> 0.2</td>
<td align="center" rowspan="1" colspan="1"> 99.1</td>
<td align="center" rowspan="1" colspan="1">
<italic>*Reg(C:Phy(T:A), T:Phy(T:B))</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>A</italic>
<italic>B</italic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab3" position="float">
<label>Table 3</label>
<caption>
<p>Indirect associations between gene
<italic>A</italic>
and gene
<italic>B</italic>
, established by combining binding and regulatory events through a common interaction partner gene
<italic>Z</italic>
. Bindings are represented with × and for regulations
<italic>A</italic>
>
<italic>B</italic>
means
<italic>A</italic>
regulates
<italic>B</italic>
. </p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"> Association</th>
<th align="center" rowspan="1" colspan="1"> Interpretation</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>A</italic>
>
<italic>Z</italic>
<
<italic>B</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>A</italic>
and
<italic>B</italic>
coregulate
<italic>Z</italic>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>A</italic>
<
<italic>Z</italic>
>
<italic>B</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>A</italic>
and
<italic>B</italic>
are being regulated by
<italic>Z</italic>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>A</italic>
×
<italic>Z</italic>
×
<italic>B</italic>
</td>
<td align="center" rowspan="1" colspan="1">
<italic>A</italic>
and
<italic>B</italic>
share a common binding partner
<italic>Z</italic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab4" position="float">
<label>Table 4</label>
<caption>
<p>Gene symbol coverage comparison, showing the number of distinct canonical symbols as well as the number of different occurrences covered, out of the total number of 40.3 M extracted gene symbols. </p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="center" colspan="2" rowspan="1"> Distinct symbols</th>
<th align="center" colspan="2" rowspan="1"> Occurrences</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1"> Canonical</td>
<td align="center" rowspan="1" colspan="1"> 1833.1 K</td>
<td align="center" rowspan="1" colspan="1"> 100.0%</td>
<td align="center" rowspan="1" colspan="1"> 40.3 M</td>
<td align="center" rowspan="1" colspan="1"> 100.0%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> HomoloGene</td>
<td align="center" rowspan="1" colspan="1"> 68.2 K</td>
<td align="center" rowspan="1" colspan="1"> 3.7%</td>
<td align="center" rowspan="1" colspan="1"> 21.1 M</td>
<td align="center" rowspan="1" colspan="1"> 52.3%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> Ensembl</td>
<td align="center" rowspan="1" colspan="1"> 60.0 K</td>
<td align="center" rowspan="1" colspan="1"> 3.2%</td>
<td align="center" rowspan="1" colspan="1"> 20.9 M</td>
<td align="center" rowspan="1" colspan="1"> 51.8%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> Ensembl Genomes</td>
<td align="center" rowspan="1" colspan="1"> 100.6 K</td>
<td align="center" rowspan="1" colspan="1"> 5.5%</td>
<td align="center" rowspan="1" colspan="1"> 24.3 M</td>
<td align="center" rowspan="1" colspan="1"> 60.1%</td>
</tr>
</tbody>
</table>
</table-wrap>
</floats-group>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Belgique/explor/OpenAccessBelV2/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000308 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000308 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Belgique
   |area=    OpenAccessBelV2
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3375141
   |texte=   Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:22719757" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a OpenAccessBelV2 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Dec 1 00:43:49 2016. Site generation: Wed Mar 6 14:51:30 2024