Serveur d'exploration sur SGML

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000183 ( Pmc/Corpus ); précédent : 0001829; suivant : 0001840 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">How to link ontologies and protein–protein interactions to literature: text-mining approaches and the BioCreative experience</title>
<author>
<name sortKey="Krallinger, Martin" sort="Krallinger, Martin" uniqKey="Krallinger M" first="Martin" last="Krallinger">Martin Krallinger</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Structural and Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Spain,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Leitner, Florian" sort="Leitner, Florian" uniqKey="Leitner F" first="Florian" last="Leitner">Florian Leitner</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Structural and Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Spain,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Vazquez, Miguel" sort="Vazquez, Miguel" uniqKey="Vazquez M" first="Miguel" last="Vazquez">Miguel Vazquez</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Structural and Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Spain,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Salgado, David" sort="Salgado, David" uniqKey="Salgado D" first="David" last="Salgado">David Salgado</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Australian Regenerative Medicine Institute, Monash University, Australia,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Marcelle, Christophe" sort="Marcelle, Christophe" uniqKey="Marcelle C" first="Christophe" last="Marcelle">Christophe Marcelle</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Australian Regenerative Medicine Institute, Monash University, Australia,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tyers, Mike" sort="Tyers, Mike" uniqKey="Tyers M" first="Mike" last="Tyers">Mike Tyers</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="bas017-AFF1">School of Biological Sciences, University of Edinburgh, Edinburgh, UK</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="bas017-AFF1">Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, QC, Canada H3C 3J7</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Valencia, Alfonso" sort="Valencia, Alfonso" uniqKey="Valencia A" first="Alfonso" last="Valencia">Alfonso Valencia</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Structural and Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Spain,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chatr Aryamontri, Andrew" sort="Chatr Aryamontri, Andrew" uniqKey="Chatr Aryamontri A" first="Andrew" last="Chatr-Aryamontri">Andrew Chatr-Aryamontri</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="bas017-AFF1">School of Biological Sciences, University of Edinburgh, Edinburgh, UK</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="bas017-AFF1">Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, QC, Canada H3C 3J7</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22438567</idno>
<idno type="pmc">3309177</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3309177</idno>
<idno type="RBID">PMC:3309177</idno>
<idno type="doi">10.1093/database/bas017</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000183</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000183</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">How to link ontologies and protein–protein interactions to literature: text-mining approaches and the BioCreative experience</title>
<author>
<name sortKey="Krallinger, Martin" sort="Krallinger, Martin" uniqKey="Krallinger M" first="Martin" last="Krallinger">Martin Krallinger</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Structural and Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Spain,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Leitner, Florian" sort="Leitner, Florian" uniqKey="Leitner F" first="Florian" last="Leitner">Florian Leitner</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Structural and Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Spain,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Vazquez, Miguel" sort="Vazquez, Miguel" uniqKey="Vazquez M" first="Miguel" last="Vazquez">Miguel Vazquez</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Structural and Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Spain,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Salgado, David" sort="Salgado, David" uniqKey="Salgado D" first="David" last="Salgado">David Salgado</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Australian Regenerative Medicine Institute, Monash University, Australia,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Marcelle, Christophe" sort="Marcelle, Christophe" uniqKey="Marcelle C" first="Christophe" last="Marcelle">Christophe Marcelle</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Australian Regenerative Medicine Institute, Monash University, Australia,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tyers, Mike" sort="Tyers, Mike" uniqKey="Tyers M" first="Mike" last="Tyers">Mike Tyers</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="bas017-AFF1">School of Biological Sciences, University of Edinburgh, Edinburgh, UK</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="bas017-AFF1">Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, QC, Canada H3C 3J7</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Valencia, Alfonso" sort="Valencia, Alfonso" uniqKey="Valencia A" first="Alfonso" last="Valencia">Alfonso Valencia</name>
<affiliation>
<nlm:aff id="bas017-AFF1">Structural and Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Spain,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chatr Aryamontri, Andrew" sort="Chatr Aryamontri, Andrew" uniqKey="Chatr Aryamontri A" first="Andrew" last="Chatr-Aryamontri">Andrew Chatr-Aryamontri</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="bas017-AFF1">School of Biological Sciences, University of Edinburgh, Edinburgh, UK</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="bas017-AFF1">Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, QC, Canada H3C 3J7</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Database: The Journal of Biological Databases and Curation</title>
<idno type="eISSN">1758-0463</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>There is an increasing interest in developing ontologies and controlled vocabularies to improve the efficiency and consistency of manual literature curation, to enable more formal biocuration workflow results and ultimately to improve analysis of biological data. Two ontologies that have been successfully used for this purpose are the Gene Ontology (GO) for annotating aspects of gene products and the Molecular Interaction ontology (PSI-MI) used by databases that archive protein–protein interactions. The examination of protein interactions has proven to be extremely promising for the understanding of cellular processes. Manual mapping of information from the biomedical literature to bio-ontology terms is one of the most challenging components in the curation pipeline. It requires that expert curators interpret the natural language descriptions contained in articles and infer their semantic equivalents in the ontology (controlled vocabulary). Since manual curation is a time-consuming process, there is strong motivation to implement text-mining techniques to automatically extract annotations from free text. A range of text mining strategies has been devised to assist in the automated extraction of biological data. These strategies either recognize technical terms used recurrently in the literature and propose them as candidates for inclusion in ontologies, or retrieve passages that serve as evidential support for annotating an ontology term, e.g. from the PSI-MI or GO controlled vocabularies. Here, we provide a general overview of current text-mining methods to automatically extract annotations of GO and PSI-MI ontology terms in the context of the BioCreative (Critical Assessment of Information Extraction Systems in Biology) challenge. Special emphasis is given to protein–protein interaction data and PSI-MI terms referring to interaction detection methods.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Neumann, B" uniqKey="Neumann B">B Neumann</name>
</author>
<author>
<name sortKey="Walter, T" uniqKey="Walter T">T Walter</name>
</author>
<author>
<name sortKey="Heriche, Jk" uniqKey="Heriche J">JK Heriche</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smogorzewska, A" uniqKey="Smogorzewska A">A Smogorzewska</name>
</author>
<author>
<name sortKey="Desetty, R" uniqKey="Desetty R">R Desetty</name>
</author>
<author>
<name sortKey="Saito, Tt" uniqKey="Saito T">TT Saito</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
<author>
<name sortKey="Stamatoyannopoulos, Ja" uniqKey="Stamatoyannopoulos J">JA Stamatoyannopoulos</name>
</author>
<author>
<name sortKey="Dutta, A" uniqKey="Dutta A">A Dutta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Seringhaus, M" uniqKey="Seringhaus M">M Seringhaus</name>
</author>
<author>
<name sortKey="Gerstein, M" uniqKey="Gerstein M">M Gerstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Galperin, My" uniqKey="Galperin M">MY Galperin</name>
</author>
<author>
<name sortKey="Cochrane, Gr" uniqKey="Cochrane G">GR Cochrane</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stein, L" uniqKey="Stein L">L Stein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Elsik, Cg" uniqKey="Elsik C">CG Elsik</name>
</author>
<author>
<name sortKey="Worley, Kc" uniqKey="Worley K">KC Worley</name>
</author>
<author>
<name sortKey="Zhang, L" uniqKey="Zhang L">L Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huss, Jw" uniqKey="Huss J">JW Huss</name>
</author>
<author>
<name sortKey="Lindenbaum, P" uniqKey="Lindenbaum P">P Lindenbaum</name>
</author>
<author>
<name sortKey="Martone, M" uniqKey="Martone M">M Martone</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leitner, F" uniqKey="Leitner F">F Leitner</name>
</author>
<author>
<name sortKey="Chatr Aryamontri, A" uniqKey="Chatr Aryamontri A">A Chatr-aryamontri</name>
</author>
<author>
<name sortKey="Mardis, Sa" uniqKey="Mardis S">SA Mardis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Superti Furga, G" uniqKey="Superti Furga G">G Superti-Furga</name>
</author>
<author>
<name sortKey="Wieland, F" uniqKey="Wieland F">F Wieland</name>
</author>
<author>
<name sortKey="Cesareni, G" uniqKey="Cesareni G">G Cesareni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baumgartner, Wa" uniqKey="Baumgartner W">WA Baumgartner</name>
</author>
<author>
<name sortKey="Cohen, Kb" uniqKey="Cohen K">KB Cohen</name>
</author>
<author>
<name sortKey="Fox, Lm" uniqKey="Fox L">LM Fox</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rebholz Schuhmann, D" uniqKey="Rebholz Schuhmann D">D Rebholz-Schuhmann</name>
</author>
<author>
<name sortKey="Kirsch, H" uniqKey="Kirsch H">H Kirsch</name>
</author>
<author>
<name sortKey="Arregui, M" uniqKey="Arregui M">M Arregui</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Couto, Fm" uniqKey="Couto F">FM Couto</name>
</author>
<author>
<name sortKey="Silva, Mj" uniqKey="Silva M">MJ Silva</name>
</author>
<author>
<name sortKey="Lee, V" uniqKey="Lee V">V Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dowell, Kg" uniqKey="Dowell K">KG Dowell</name>
</author>
<author>
<name sortKey="Mcandrews Hill, Ms" uniqKey="Mcandrews Hill M">MS McAndrews-Hill</name>
</author>
<author>
<name sortKey="Hill, Dp" uniqKey="Hill D">DP Hill</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wiegers, Tc" uniqKey="Wiegers T">TC Wiegers</name>
</author>
<author>
<name sortKey="Davis, Ap" uniqKey="Davis A">AP Davis</name>
</author>
<author>
<name sortKey="Cohen, Kb" uniqKey="Cohen K">KB Cohen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Alterovitz, G" uniqKey="Alterovitz G">G Alterovitz</name>
</author>
<author>
<name sortKey="Xiang, M" uniqKey="Xiang M">M Xiang</name>
</author>
<author>
<name sortKey="Hill, Dp" uniqKey="Hill D">DP Hill</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hirschman, L" uniqKey="Hirschman L">L Hirschman</name>
</author>
<author>
<name sortKey="Yeh, A" uniqKey="Yeh A">A Yeh</name>
</author>
<author>
<name sortKey="Blaschke, C" uniqKey="Blaschke C">C Blaschke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leitner, F" uniqKey="Leitner F">F Leitner</name>
</author>
<author>
<name sortKey="Mardis, Sa" uniqKey="Mardis S">SA Mardis</name>
</author>
<author>
<name sortKey="Krallinger, M" uniqKey="Krallinger M">M Krallinger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aranda, B" uniqKey="Aranda B">B Aranda</name>
</author>
<author>
<name sortKey="Achuthan, P" uniqKey="Achuthan P">P Achuthan</name>
</author>
<author>
<name sortKey="Alam Faruque, Y" uniqKey="Alam Faruque Y">Y Alam-Faruque</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ceol, A" uniqKey="Ceol A">A Ceol</name>
</author>
<author>
<name sortKey="Chatr Aryamontri, A" uniqKey="Chatr Aryamontri A">A Chatr-Aryamontri</name>
</author>
<author>
<name sortKey="Licata, L" uniqKey="Licata L">L Licata</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salwinski, L" uniqKey="Salwinski L">L Salwinski</name>
</author>
<author>
<name sortKey="Miller, Cs" uniqKey="Miller C">CS Miller</name>
</author>
<author>
<name sortKey="Smith, Aj" uniqKey="Smith A">AJ Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stark, C" uniqKey="Stark C">C Stark</name>
</author>
<author>
<name sortKey="Breitkreutz, Bj" uniqKey="Breitkreutz B">BJ Breitkreutz</name>
</author>
<author>
<name sortKey="Chatr Aryamontri, A" uniqKey="Chatr Aryamontri A">A Chatr-Aryamontri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mewes, Hw" uniqKey="Mewes H">HW Mewes</name>
</author>
<author>
<name sortKey="Ruepp, A" uniqKey="Ruepp A">A Ruepp</name>
</author>
<author>
<name sortKey="Theis, F" uniqKey="Theis F">F Theis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chautard, E" uniqKey="Chautard E">E Chautard</name>
</author>
<author>
<name sortKey="Fatoux Ardore, M" uniqKey="Fatoux Ardore M">M Fatoux-Ardore</name>
</author>
<author>
<name sortKey="Ballut, L" uniqKey="Ballut L">L Ballut</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goll, J" uniqKey="Goll J">J Goll</name>
</author>
<author>
<name sortKey="Rajagopala, Sv" uniqKey="Rajagopala S">SV Rajagopala</name>
</author>
<author>
<name sortKey="Shiau, Sc" uniqKey="Shiau S">SC Shiau</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kerrien, S" uniqKey="Kerrien S">S Kerrien</name>
</author>
<author>
<name sortKey="Orchard, S" uniqKey="Orchard S">S Orchard</name>
</author>
<author>
<name sortKey="Montecchi Palazzi, L" uniqKey="Montecchi Palazzi L">L Montecchi-Palazzi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cote, Rg" uniqKey="Cote R">RG Cote</name>
</author>
<author>
<name sortKey="Jones, P" uniqKey="Jones P">P Jones</name>
</author>
<author>
<name sortKey="Apweiler, R" uniqKey="Apweiler R">R Apweiler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, B" uniqKey="Smith B">B Smith</name>
</author>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author>
<name sortKey="Rosse, C" uniqKey="Rosse C">C Rosse</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Spasic, I" uniqKey="Spasic I">I Spasic</name>
</author>
<author>
<name sortKey="Schober, D" uniqKey="Schober D">D Schober</name>
</author>
<author>
<name sortKey="Sansone, Sa" uniqKey="Sansone S">SA Sansone</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bodenreider, O" uniqKey="Bodenreider O">O Bodenreider</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tirmizi, Sh" uniqKey="Tirmizi S">SH Tirmizi</name>
</author>
<author>
<name sortKey="Aitken, S" uniqKey="Aitken S">S Aitken</name>
</author>
<author>
<name sortKey="Moreira, Da" uniqKey="Moreira D">DA Moreira</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author>
<name sortKey="Ball, Ca" uniqKey="Ball C">CA Ball</name>
</author>
<author>
<name sortKey="Blake, Ja" uniqKey="Blake J">JA Blake</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hill, Dp" uniqKey="Hill D">DP Hill</name>
</author>
<author>
<name sortKey="Smith, B" uniqKey="Smith B">B Smith</name>
</author>
<author>
<name sortKey="Mcandrews Hill, Ms" uniqKey="Mcandrews Hill M">MS McAndrews-Hill</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mccray, At" uniqKey="Mccray A">AT McCray</name>
</author>
<author>
<name sortKey="Browne, Ac" uniqKey="Browne A">AC Browne</name>
</author>
<author>
<name sortKey="Bodenreider, O" uniqKey="Bodenreider O">O Bodenreider</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beisswanger, E" uniqKey="Beisswanger E">E Beisswanger</name>
</author>
<author>
<name sortKey="Poprat, M" uniqKey="Poprat M">M Poprat</name>
</author>
<author>
<name sortKey="Hahn, U" uniqKey="Hahn U">U Hahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blaschke, C" uniqKey="Blaschke C">C Blaschke</name>
</author>
<author>
<name sortKey="Leon, Ea" uniqKey="Leon E">EA Leon</name>
</author>
<author>
<name sortKey="Krallinger, M" uniqKey="Krallinger M">M Krallinger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Muller, Hm" uniqKey="Muller H">HM Muller</name>
</author>
<author>
<name sortKey="Kenny, Ee" uniqKey="Kenny E">EE Kenny</name>
</author>
<author>
<name sortKey="Sternberg, Pw" uniqKey="Sternberg P">PW Sternberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jonquet, C" uniqKey="Jonquet C">C Jonquet</name>
</author>
<author>
<name sortKey="Shah, Nh" uniqKey="Shah N">NH Shah</name>
</author>
<author>
<name sortKey="Musen, Ma" uniqKey="Musen M">MA Musen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rzhetsky, A" uniqKey="Rzhetsky A">A Rzhetsky</name>
</author>
<author>
<name sortKey="Iossifov, I" uniqKey="Iossifov I">I Iossifov</name>
</author>
<author>
<name sortKey="Koike, T" uniqKey="Koike T">T Koike</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xuan, W" uniqKey="Xuan W">W Xuan</name>
</author>
<author>
<name sortKey="Dai, M" uniqKey="Dai M">M Dai</name>
</author>
<author>
<name sortKey="Mirel, B" uniqKey="Mirel B">B Mirel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yeh, A" uniqKey="Yeh A">A Yeh</name>
</author>
<author>
<name sortKey="Morgan, A" uniqKey="Morgan A">A Morgan</name>
</author>
<author>
<name sortKey="Colosimo, M" uniqKey="Colosimo M">M Colosimo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hirschman, L" uniqKey="Hirschman L">L Hirschman</name>
</author>
<author>
<name sortKey="Colosimo, M" uniqKey="Colosimo M">M Colosimo</name>
</author>
<author>
<name sortKey="Morgan, A" uniqKey="Morgan A">A Morgan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Magrane, M" uniqKey="Magrane M">M Magrane</name>
</author>
<author>
<name sortKey="Consortium, U" uniqKey="Consortium U">U Consortium</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krallinger, M" uniqKey="Krallinger M">M Krallinger</name>
</author>
<author>
<name sortKey="Leitner, F" uniqKey="Leitner F">F Leitner</name>
</author>
<author>
<name sortKey="Rodriguez Penagos, C" uniqKey="Rodriguez Penagos C">C Rodriguez-Penagos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chatr Aryamontri, A" uniqKey="Chatr Aryamontri A">A Chatr-aryamontri</name>
</author>
<author>
<name sortKey="Kerrien, S" uniqKey="Kerrien S">S Kerrien</name>
</author>
<author>
<name sortKey="Khadake, J" uniqKey="Khadake J">J Khadake</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Piao, S" uniqKey="Piao S">S Piao</name>
</author>
<author>
<name sortKey="Mcnaught, J" uniqKey="Mcnaught J">J McNaught</name>
</author>
<author>
<name sortKey="Ananiadou, S" uniqKey="Ananiadou S">S Ananiadou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author>
<name sortKey="Rak, R" uniqKey="Rak R">R Rak</name>
</author>
<author>
<name sortKey="Restificar, A" uniqKey="Restificar A">A Restificar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pesquita, C" uniqKey="Pesquita C">C Pesquita</name>
</author>
<author>
<name sortKey="Faria, D" uniqKey="Faria D">D Faria</name>
</author>
<author>
<name sortKey="Falcao, Ao" uniqKey="Falcao A">AO Falcao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fellbaum, C" uniqKey="Fellbaum C">C Fellbaum</name>
</author>
<author>
<name sortKey="Hahn, U" uniqKey="Hahn U">U Hahn</name>
</author>
<author>
<name sortKey="Smith, B" uniqKey="Smith B">B Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Arighi, Cn" uniqKey="Arighi C">CN Arighi</name>
</author>
<author>
<name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
<author>
<name sortKey="Krallinger, M" uniqKey="Krallinger M">M Krallinger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krallinger, M" uniqKey="Krallinger M">M Krallinger</name>
</author>
<author>
<name sortKey="Vazquez, M" uniqKey="Vazquez M">M Vazquez</name>
</author>
<author>
<name sortKey="Leitner, F" uniqKey="Leitner F">F Leitner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chatr Aryamontri, A" uniqKey="Chatr Aryamontri A">A Chatr-Aryamontri</name>
</author>
<author>
<name sortKey="Winter, A" uniqKey="Winter A">A Winter</name>
</author>
<author>
<name sortKey="Perfetto, L" uniqKey="Perfetto L">L Perfetto</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schneider, G" uniqKey="Schneider G">G Schneider</name>
</author>
<author>
<name sortKey="Clematide, S" uniqKey="Clematide S">S Clematide</name>
</author>
<author>
<name sortKey="Rinaldi, F" uniqKey="Rinaldi F">F Rinaldi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lourenco, A" uniqKey="Lourenco A">A Lourenco</name>
</author>
<author>
<name sortKey="Conover, M" uniqKey="Conover M">M Conover</name>
</author>
<author>
<name sortKey="Wong, A" uniqKey="Wong A">A Wong</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Agarwal, S" uniqKey="Agarwal S">S Agarwal</name>
</author>
<author>
<name sortKey="Liu, F" uniqKey="Liu F">F Liu</name>
</author>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, Jd" uniqKey="Kim J">JD Kim</name>
</author>
<author>
<name sortKey="Ohta, T" uniqKey="Ohta T">T Ohta</name>
</author>
<author>
<name sortKey="Tateisi, Y" uniqKey="Tateisi Y">Y Tateisi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bada, M" uniqKey="Bada M">M Bada</name>
</author>
<author>
<name sortKey="Hunter, Le" uniqKey="Hunter L">LE Hunter</name>
</author>
<author>
<name sortKey="Eckert, M" uniqKey="Eckert M">M Eckert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brewster, C" uniqKey="Brewster C">C Brewster</name>
</author>
<author>
<name sortKey="Jupp, S" uniqKey="Jupp S">S Jupp</name>
</author>
<author>
<name sortKey="Luciano, J" uniqKey="Luciano J">J Luciano</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Database (Oxford)</journal-id>
<journal-id journal-id-type="iso-abbrev">Database (Oxford)</journal-id>
<journal-id journal-id-type="publisher-id">database</journal-id>
<journal-id journal-id-type="hwp">databa</journal-id>
<journal-title-group>
<journal-title>Database: The Journal of Biological Databases and Curation</journal-title>
</journal-title-group>
<issn pub-type="epub">1758-0463</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22438567</article-id>
<article-id pub-id-type="pmc">3309177</article-id>
<article-id pub-id-type="doi">10.1093/database/bas017</article-id>
<article-id pub-id-type="publisher-id">bas017</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Original Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>How to link ontologies and protein–protein interactions to literature: text-mining approaches and the BioCreative experience</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Krallinger</surname>
<given-names>Martin</given-names>
</name>
<xref ref-type="aff" rid="bas017-AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Leitner</surname>
<given-names>Florian</given-names>
</name>
<xref ref-type="aff" rid="bas017-AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Vazquez</surname>
<given-names>Miguel</given-names>
</name>
<xref ref-type="aff" rid="bas017-AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Salgado</surname>
<given-names>David</given-names>
</name>
<xref ref-type="aff" rid="bas017-AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Marcelle</surname>
<given-names>Christophe</given-names>
</name>
<xref ref-type="aff" rid="bas017-AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Tyers</surname>
<given-names>Mike</given-names>
</name>
<xref ref-type="aff" rid="bas017-AFF1">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="bas017-AFF1">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Valencia</surname>
<given-names>Alfonso</given-names>
</name>
<xref ref-type="aff" rid="bas017-AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chatr-aryamontri</surname>
<given-names>Andrew</given-names>
</name>
<xref ref-type="aff" rid="bas017-AFF1">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="bas017-AFF1">
<sup>4</sup>
</xref>
<xref ref-type="corresp" rid="bas017-COR1">*</xref>
</contrib>
</contrib-group>
<aff id="bas017-AFF1">
<sup>1</sup>
Structural and Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Spain,
<sup>2</sup>
Australian Regenerative Medicine Institute, Monash University, Australia,
<sup>3</sup>
School of Biological Sciences, University of Edinburgh, Edinburgh, UK and
<sup>4</sup>
Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, QC, Canada H3C 3J7</aff>
<author-notes>
<corresp id="bas017-COR1">*
<bold>Corresponding author:</bold>
Tel: +1 514 343 111 ext. 44668; Fax: +1 514 343 5839; Email:
<email>andrew.chatr-aryamontri@umontreal.ca</email>
</corresp>
</author-notes>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>21</day>
<month>3</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>21</day>
<month>3</month>
<year>2012</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>2012</volume>
<elocation-id>bas017</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>10</month>
<year>2011</year>
</date>
<date date-type="rev-recd">
<day>14</day>
<month>2</month>
<year>2012</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>2</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2012. Published by Oxford University Press.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by-nc/3.0">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/3.0">http://creativecommons.org/licenses/by-nc/3.0</ext-link>
), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>There is an increasing interest in developing ontologies and controlled vocabularies to improve the efficiency and consistency of manual literature curation, to enable more formal biocuration workflow results and ultimately to improve analysis of biological data. Two ontologies that have been successfully used for this purpose are the Gene Ontology (GO) for annotating aspects of gene products and the Molecular Interaction ontology (PSI-MI) used by databases that archive protein–protein interactions. The examination of protein interactions has proven to be extremely promising for the understanding of cellular processes. Manual mapping of information from the biomedical literature to bio-ontology terms is one of the most challenging components in the curation pipeline. It requires that expert curators interpret the natural language descriptions contained in articles and infer their semantic equivalents in the ontology (controlled vocabulary). Since manual curation is a time-consuming process, there is strong motivation to implement text-mining techniques to automatically extract annotations from free text. A range of text mining strategies has been devised to assist in the automated extraction of biological data. These strategies either recognize technical terms used recurrently in the literature and propose them as candidates for inclusion in ontologies, or retrieve passages that serve as evidential support for annotating an ontology term, e.g. from the PSI-MI or GO controlled vocabularies. Here, we provide a general overview of current text-mining methods to automatically extract annotations of GO and PSI-MI ontology terms in the context of the BioCreative (Critical Assessment of Information Extraction Systems in Biology) challenge. Special emphasis is given to protein–protein interaction data and PSI-MI terms referring to interaction detection methods.</p>
</abstract>
<counts>
<page-count count="12"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec>
<title>Introduction</title>
<p>Advances in laboratory technologies and data analysis methodologies are permitting the exploitation of complex experimental data sets in ways that were unthinkable just a few years ago (
<xref ref-type="bibr" rid="bas017-B1 bas017-B2 bas017-B3">1–3</xref>
). However, although the number of scientific articles containing relevant data is steadily increasing, the majority of published data is still not easily accessible for automated text processing systems. In fact, the information is still buried within the articles rather than being summarized in computer readable formats (
<xref ref-type="bibr" rid="bas017-B4">4</xref>
). Therefore, it is necessary to perform the additional step of annotating the experimental data in formats suitable for systematic consultation or computation. This task is performed manually by curators of databases specialized in diverse biological domains, ranging from cellular phenotypes and tissue anatomy to gene function. The importance and the critical role played by such themed biocuration efforts are evident by the multitude of databases reported over the years in the NAR Database special issue (
<xref ref-type="bibr" rid="bas017-B5">5</xref>
) and by the birth of dedicated journals such as Database.</p>
<p>Different models have been followed to generate annotations from the literature (
<xref ref-type="bibr" rid="bas017-B6">6</xref>
,
<xref ref-type="bibr" rid="bas017-B7">7</xref>
). In the museum model, a relatively small group of specialized curators perform a particular literature curation effort, while in the jamboree model a group of experts meet for a short intensive annotation workshop. When various research groups scattered at different locations share common research interests and they jointly organize into a collaborative decentralized annotation effort (working from their own laboratories), the so-called cottage industry model is followed. Devoted expert curators produce quality annotations, but because manual curation is time-consuming and there is a limited number of curators, it is difficult to keep current with the literature. Potential alternatives inspired by successful efforts, such as Wikipedia, are the open community model (
<xref ref-type="bibr" rid="bas017-B8">8</xref>
) and the author-based annotations model (
<xref ref-type="bibr" rid="bas017-B9">9</xref>
,
<xref ref-type="bibr" rid="bas017-B10">10</xref>
). The first does not have major restrictions on the actual annotators, as the whole community can contribute to generate annotations. In some cases, qualified roles for the contributors have been proposed to guarantee a certain level of confidence in the annotations. The idea behind author-based annotations is that the authors themselves provide minimal annotations of their own article during the writing or submission process, going beyond author-provided keywords for indexing purposes.</p>
<p>Each of the manual literature curation models previously introduced here still faces the problem of the increasing volume of literature (
<xref ref-type="bibr" rid="bas017-B11">11</xref>
). Therefore, some attempts have been made to generate annotations automatically using automated text mining. Databases constructed according to the automated text-mining model are limited by performance issues but can generate valuable results in case of lack of manual annotations (
<xref ref-type="bibr" rid="bas017-B12">12</xref>
,
<xref ref-type="bibr" rid="bas017-B13">13</xref>
). A hybrid approach, namely text-mining-assisted manual curation, wherein semi-automated literature mining tools are integrated into the biocuration workflow, represents a more promising solution (
<xref ref-type="bibr" rid="bas017-B14">14</xref>
,
<xref ref-type="bibr" rid="bas017-B15">15</xref>
).</p>
<p>Controlled vocabularies have been fundamental for all of these diverse annotation types, from the purely manual ones to totally automatic annotations. Key tools in the annotation of experimental data are bio-ontologies, a well-defined set of logic relations and controlled vocabularies that permit an accurate description of the experimental findings (
<xref ref-type="bibr" rid="bas017-B16">16</xref>
).</p>
<p>The BioCreative initiative (Critical Assessment of Information Extraction systems in Biology) (
<xref ref-type="bibr" rid="bas017-B17">17</xref>
,
<xref ref-type="bibr" rid="bas017-B18">18</xref>
) is a community-wide effort for the evaluation of text mining and information extraction systems applied to the biological domain. Its major purpose is to stimulate the development of software that can assist the biological databases in coping with the deluge of data generated by the ‘omics’ era. We provide here a general overview of the BioCreative experience with biomedical ontologies. For the BioCreative initiatives, it was of particular importance that annotations chosen as part of a challenge task had been generated through a model followed by research groups employing expert curators using well-established biocuration workflows refined over years of manual literature curation.</p>
<p>In particular, we will focus on the attempts that have been made to automatically extract protein–protein interaction (PPI) data taking advantage of ontologies, and to associate ontology terms to the interactions.</p>
</sec>
<sec>
<title>Protein interaction biocuration</title>
<p>The opportunity to decipher the mechanisms underlying cellular physiology from the analysis of molecular interaction networks has prompted the establishment of databases devoted to the collection of such data, with great attention to protein and genetic interactions (
<xref ref-type="bibr" rid="bas017-B19 bas017-B20 bas017-B21 bas017-B22">19–22</xref>
). Some of the major protein interaction databases (
<xref ref-type="bibr" rid="bas017-B19 bas017-B20 bas017-B21 bas017-B22 bas017-B23 bas017-B24 bas017-B25">19–25</xref>
) are now federated in the International Molecular Exchange (IMEx) consortium, whose primary goals are to minimize curation redundancy and to share the data in a common format. All active IMEx members share the same data representation standard, the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) (
<xref ref-type="bibr" rid="bas017-B26">26</xref>
). The PSI-MI provides the logic model and the controlled vocabulary for representation of molecular interactions. Not surprisingly, the members of the IMEx consortium themselves are the main contributors to the development and maintenance of the PSI-MI ontology.</p>
<p>The PSI-MI was introduced with the intent to facilitate data integration among databases specifically for the representation of binary or
<italic>n</italic>
-nary interactions. It also allows in-depth annotation of the experimental set-up such as the experimental or biological role of the interactors, the experimental method employed for the detection of the interaction, the binding domain of the interactors, and the kinetics of the binding reaction, among other attributes (the PSI-MI ontology can be explored at the EBI ontology look-up service) (
<xref ref-type="bibr" rid="bas017-B27">27</xref>
). The PSI-MI is not restricted to the representation of physical interactions but permits the thorough annotation of genetic interactions and even experimental evidence of co-localization among molecules. Each attribute of the interaction is described by a rich controlled vocabulary which is organized in a well-defined hierarchy and continuously updated and maintained by the PSI-MI workgroup. Regrettably, despite the cooperative efforts of the IMEx databases, the complete annotation of interaction data from the biomedical literature, and in particular, the subset of interactions involving human genes and their products, remains far from complete. The time-consuming nature of manual curation severely hampers the achievement of an exhaustive collection of molecular interactions. The thorough annotation of the experimental data contained in a single scientific article can take anywhere from minutes to hours. Hence, any automated support that assists the database curators—be it the selection of the relevant literature or identification and annotation of the interactions—is more than welcome by the database community.
<xref ref-type="fig" rid="bas017-F1">Figure 1</xref>
provides a schematic representation of the manual literature curation of PSI-MI concepts for protein interaction annotation.
<fig id="bas017-F1" position="float">
<label>Figure 1.</label>
<caption>
<p>This figure shows schematically how protein interaction data is annotated and/or marked up using ontologies. Systems such as MyMiner (myminer.armi.monash.edu.au/links.php), have been used for text labeling and highlighting purposes in the context of the BioCreative competition. The main steps illustrated in this figure have been addressed in the BioCreative challenges. Finding associations between textual expressions referring to experimental techniques used to characterize protein interactions and their equivalent concepts in the MI ontology is cumbersome in some cases when deep domain inference is required. Experienced curators are able to quickly navigate the term hierarchy to find the appropriate terms while novice annotators often need to search the ontology using method keywords as queries and consult associated descriptive information for potential candidate terms.</p>
</caption>
<graphic xlink:href="bas017f1"></graphic>
</fig>
</p>
<p>A number of initiatives have been started in order to facilitate the automated extraction of information from the biomedical literature and of PPI data in particular. The Structured Digital Abstracts developed by
<italic>FEBS Letters</italic>
in collaboration with the MINT database (
<xref ref-type="bibr" rid="bas017-B20">20</xref>
), for instance, is a structured text appended to the classical abstract that can be easily parsed by text-mining tools. Each biological entity (proteins) and relationship between these entities is tagged with appropriate database identifiers, thus permitting an unambiguous interpretation of the data.</p>
</sec>
<sec>
<title>Natural language processing and ontologies</title>
<p>In recent years, we have witnessed a flourishing of ontologies that attempt to accurately represent the complexity of the biological sciences (
<xref ref-type="bibr" rid="bas017-B28">28</xref>
). Hence, we now have ontologies describing a wide variety of biological concepts, spanning from clinical symptoms to molecular interactions. They not only attempt to capture in a more formal way the meaning (semantics) of a particular domain based on community consensus (
<xref ref-type="bibr" rid="bas017-B29">29</xref>
) but are also a key element for database interoperability and querying, as well as knowledge management and data integration (
<xref ref-type="bibr" rid="bas017-B30">30</xref>
).</p>
<p>Some of these ontologies can now be integrated with other ontologies, broadening their descriptive potential (
<xref ref-type="bibr" rid="bas017-B31">31</xref>
). Furthermore, the Gene Ontology (GO) (
<xref ref-type="bibr" rid="bas017-B32">32</xref>
) has grown considerably over 10 years, counting now almost 35 000 terms, compared to the initial 5000. [for a general introduction to the GO annotation process refer to Hill
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="bas017-B33">33</xref>
)].</p>
<p>The increasing number of biological terms and concepts covered by these ontologies has prompted a growing interest in their potential for use in the development of methods for automatic data extraction from the biomedical literature.</p>
<p>However, while biomedical ontologies are indispensable in the daily practice of database curators, it remains to be established if text mining can really benefit from well-established ontologies. In fact, while an analysis of the lexical properties of the GO indicates that a large percentage of GO terms are potentially useful for text mining tools (
<xref ref-type="bibr" rid="bas017-B34">34</xref>
), other evidence suggests that many of the Open Bbiomedical Ontologies (
<xref ref-type="bibr" rid="bas017-B28">28</xref>
) are not suitable for effective natural language processing applications (
<xref ref-type="bibr" rid="bas017-B35">35</xref>
).</p>
<p>This discrepancy is due to the fact that often the information is not only present as natural language data, but often also requires interpretation of information contained in images or obtained by interpreting the data reported in the articles. As a consequence, not every piece of information is unambiguously linked to a continuous passage of text hence detectable by parsing machines.</p>
<p>The results of the first BioCreative challenge suggest that a combination of several factors can influence the performance of text mining systems in the extraction of GO terms associated with defined genes, including the specificity of the terms and their GO branch membership (
<xref ref-type="bibr" rid="bas017-B36">36</xref>
).</p>
<p>Ontologies benefitting from an iterative process of expansion and restructuring based on direct observations (analysis of scientific literature) made by communities of active users more likely will successfully result in a resource for text-mining purpose. Inclusion of such observations in the ontologies will dramatically increase their potential in the context of text mining.</p>
<p>Nevertheless, some popular text-mining-based applications, such as Textpresso (
<xref ref-type="bibr" rid="bas017-B37">37</xref>
), NCBO Annotator (
<xref ref-type="bibr" rid="bas017-B38">38</xref>
), Geneways (
<xref ref-type="bibr" rid="bas017-B39">39</xref>
), Domeo (
<xref ref-type="bibr" rid="bas017-B40">40</xref>
) or PubOnto (
<xref ref-type="bibr" rid="bas017-B41">41</xref>
), rely on the usage of ontologies. These kinds of systems are currently exploring ontologies mainly as lexical resources of controlled vocabulary terms for text indexing or markup purposes. They assist the end users in improving the detection of annotation-relevant information at a very general level. Efficiently handling complex terms and annotation types is thus still a challenge for such approaches, making the results of the BioCreative tasks particularly interesting to better understand the comparison between manual and automated extractions. Adapting some of the methodologies that participated in BioCreative into such technical frameworks could potentially capture previously missing annotation types or concepts.</p>
</sec>
<sec>
<title>BioCreative</title>
<p>The BioCreative challenge was established in 2004 with the purpose of assessing the state-of-the-art of text-mining technologies applied to biological problems. Although it is called a challenge, the primary aim of BioCreative is not to identify a contest winner. Instead the ambition of BioCreative is manifold: (i) to benchmark the performance of text mining applications, (ii) to promote communication between bioinformaticians, text miners, and database curators, (iii) to define shared training and ‘gold standard’ test data and (iv) to spur the development of high-performance suites. To date, four editions of BioCreative have been organized, each consisting of two or more specific tasks (
<xref ref-type="table" rid="bas017-T1">Table 1</xref>
). Each task was designed to test the ability of the systems to detect biological entities (gene or proteins) and/or to link them to stable database identifiers, and evaluate how efficiently facts or functional relations can be associated with the biological entities (e.g. protein function and PPI).
<xref ref-type="fig" rid="bas017-F2">Figure 2</xref>
shows how these BioCreative challenges have evolved over time in the context of related community efforts, resources and applications.
<fig id="bas017-F2" position="float">
<label>Figure 2.</label>
<caption>
<p>Historical view and timeline of the BioCreative challenges in the context of other community efforts, textual resources (corpora) and applications developed in the area of biomedical text mining. The upper bar shows the number of new records added to PubMed each year, expressed in thousands (K). The lower bar refers to the corresponding year timeline. Pink squares, appearance of biomedical text mining methods; green octagons, relevant ontologies, lexical resources and corpora; yellow boxes, community challenges; blue ovals, biomedical text mining applications.</p>
</caption>
<graphic xlink:href="bas017f2"></graphic>
</fig>
<table-wrap id="bas017-T1" position="float">
<label>Table 1.</label>
<caption>
<p>Summary of the BioCreative editions related to the identification of ontology terms in articles</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Information</th>
<th rowspan="1" colspan="1">BioCreative I, task 1</th>
<th rowspan="1" colspan="1">BioCreative I, task 2</th>
<th rowspan="1" colspan="1">BioCreative II—IMS</th>
<th rowspan="1" colspan="1">BioCreative III—IMS</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Description</td>
<td rowspan="1" colspan="1">Return evidence text fragments for protein–GO–document triplets</td>
<td rowspan="1" colspan="1">Predict GO annotations derivable from a given protein–article pair</td>
<td rowspan="1" colspan="1">Prediction of MI annotations from PPI-relevant articles</td>
<td rowspan="1" colspan="1">Prediction of MI annotations from PPI-relevant articles (ranked with evidence passages)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ontologies</td>
<td rowspan="1" colspan="1">GO</td>
<td rowspan="1" colspan="1">GO</td>
<td rowspan="1" colspan="1">MI ontology</td>
<td rowspan="1" colspan="1">MI ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Curators/databases</td>
<td rowspan="1" colspan="1">GOA-EBI</td>
<td rowspan="1" colspan="1">GOA-EBI</td>
<td rowspan="1" colspan="1">MINT and IntAct</td>
<td rowspan="1" colspan="1">BioGRID and MINT</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Participants</td>
<td rowspan="1" colspan="1">9</td>
<td rowspan="1" colspan="1">6</td>
<td rowspan="1" colspan="1">2</td>
<td rowspan="1" colspan="1">8</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Data/format</td>
<td rowspan="1" colspan="1">Full-text articles, SGML format</td>
<td rowspan="1" colspan="1">Full-text articles, SGML format</td>
<td rowspan="1" colspan="1">Full-text articles, PDF and HTML format</td>
<td rowspan="1" colspan="1">Full-text articles, PDF format</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Training</td>
<td rowspan="1" colspan="1">803 articles</td>
<td rowspan="1" colspan="1">803 articles</td>
<td rowspan="1" colspan="1">740 articles</td>
<td rowspan="1" colspan="1">2003 training articles and 587 development set articles</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Test</td>
<td rowspan="1" colspan="1">113 articles</td>
<td rowspan="1" colspan="1">99 articles</td>
<td rowspan="1" colspan="1">358 articles</td>
<td rowspan="1" colspan="1">223 articles</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Evaluation</td>
<td rowspan="1" colspan="1">Three labels (correct, general, wrong), % correct cases</td>
<td rowspan="1" colspan="1">Three labels (correct, general, wrong), % correct cases</td>
<td rowspan="1" colspan="1">Precision, recall and F-score; mapping to the parent terms</td>
<td rowspan="1" colspan="1">Precision, recall, F-score, ranked predictions (AUC iP/R)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Methods</td>
<td rowspan="1" colspan="1">Term lookup, pattern matching/template extraction, term tokens (information content of GO words,
<italic>n</italic>
-gram models), part-of-speech of GO words and machine learning</td>
<td rowspan="1" colspan="1">Term lookup, pattern matching/template extraction, term tokens (information content of GO words,
<italic>n</italic>
-gram models), part-of-speech of GO words and machine learning</td>
<td rowspan="1" colspan="1">Pattern matching, automatically generating variants of MI terms, handcrafted patterns</td>
<td rowspan="1" colspan="1">Cross-ontology mapping, manual and automatic extension of method names, statistic of work tokens building terms (mutual information, chi square), machine learning of training set articles</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Result highlights</td>
<td rowspan="1" colspan="1">Precisions from 46% to 80%, accuracy of ∼30%</td>
<td rowspan="1" colspan="1">Precisions from 9% to 35%</td>
<td rowspan="1" colspan="1">Precision from 32% to 67%, best
<italic>F</italic>
-score of 48</td>
<td rowspan="1" colspan="1">Most between 30% and 80%, best
<italic>F</italic>
-score of 55</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Observation</td>
<td rowspan="1" colspan="1">Limited recall, effect of GO term length</td>
<td rowspan="1" colspan="1">Limited recall, difference in performance depending on GO categories, cellular component terms are easier</td>
<td rowspan="1" colspan="1">Difficulties with very general method terms</td>
<td rowspan="1" colspan="1">Difficulties in case of methods not specific to PPIs, problems with recall</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>The first edition of the BioCreative challenge (
<xref ref-type="bibr" rid="bas017-B17">17</xref>
) was geared to the needs of model organism database curators. It consisted of two main tasks. The first task was further divided into two subtasks: the recognition of gene mentions in the text (
<xref ref-type="bibr" rid="bas017-B42">42</xref>
) and the linking of identified proteins from yeast, fly and mouse in abstracts to model organism database identifiers (
<xref ref-type="bibr" rid="bas017-B43">43</xref>
). The second task challenged the participants to annotate human gene products, defined by their UniProtKB/Swiss-Prot accession codes (
<xref ref-type="bibr" rid="bas017-B44">44</xref>
), with the corresponding GO codes by mining full-text articles (
<xref ref-type="bibr" rid="bas017-B36">36</xref>
). In particular, teams were asked to return the textual evidence for the GO term assigned to a defined set of proteins.
<xref ref-type="fig" rid="bas017-F3">Figure 3</xref>
illustrates schematically the idea behind the associated annotation process where for proteins described in a given paper, GO annotation evidence had to be extracted.
<fig id="bas017-F3" position="float">
<label>Figure 3.</label>
<caption>
<p>Schematic overview of the extraction of GO annotations from the literature. The process illustrates the individual steps of the annotation process, covering the initial selection of relevant documents for GO annotation of proteins, identification of proteins and their corresponding database identifiers followed by the extraction of associations to GO terms and the retrieval of evidence sentences/passages. The participating teams had to provide the evidence passages for a given document–protein–GO term triplet for one subtask, and to actually detect GO–protein associations (together with evidence passages) for the other subtask.</p>
</caption>
<graphic xlink:href="bas017f3"></graphic>
</fig>
</p>
<p>Precision and recall were the basic metrics employed to evaluate the performance of the systems during this BioCreative challenge. Precision is the fraction of true positive (TP) cases, i.e. correct results, divided by the sum of TP and false positive (FP) cases. Recall can be considered as the fraction of TP results divided by the sum of TP and false negative (FN) results, i.e. relevant cases missed by the system. To account for both of these measures, the
<italic>F</italic>
-measure, i.e. harmonic mean of precision and recall was used. For the GO task, database curators had to manually evaluate the automatically extracted evidence passages to determine if they correctly supported the annotations, as exemplified in
<xref ref-type="fig" rid="bas017-F4">Figure 4</xref>
(
<xref ref-type="bibr" rid="bas017-B36">36</xref>
).
<fig id="bas017-F4" position="float">
<label>Figure 4.</label>
<caption>
<p>Example predictions of the GO task of BioCreative I. (
<bold>A</bold>
) Here a correct prediction is shown, containing the information on the corresponding document, protein and GO term as well as the supporting evidence text passages extracted automatically from the full-text article. (
<bold>B</bold>
) Example prediction (wrong) showing a screen shot of the original evaluation interface developed at the time for this task (based on Apache/PHP). The original evaluation application is not functional anymore and was implemented specifically for this task. Proteins and GO terms were defined unambiguously through corresponding standard identifiers. The database curators manually evaluated both the correctness of the protein as well as the GO terms.</p>
</caption>
<graphic xlink:href="bas017f4"></graphic>
</fig>
</p>
<p>The first BioCreative competition saw the participation of 27 teams and some of the text mining algorithms yielded encouraging results in the identification of the gene names and in linking them to database identifiers (80% precision/recall) (
<xref ref-type="bibr" rid="bas017-B43">43</xref>
).</p>
<p>The identification of gene mentions in sentences was addressed using machine-learning and natural language processing techniques and benefited from training and test data in the form of labeled text prepared by biologists.</p>
<p>For linking (normalizing) genes mentioned in abstracts, there was a considerable variability in performance depending on the used model organism. In the case of yeast, an
<italic>F</italic>
-score of 0.92 could be reached, while in the case of fly (
<italic>F</italic>
-score of 0.82) and mouse (
<italic>F</italic>
-score of 0.79) the performance was considerable lower due to less consistent naming nomenclature use and high degree of ambiguity of gene names.</p>
<p>Conversely, the results of the functional annotation task proved that the interpretation of complex biological data, and thus linking text to the GO ontology, is extremely challenging for text mining tools. The obtained results indicated that some categories of GO, in particular, the terms expressing sub-cellular location provided by the cellular component (CC) branch seemed to be more amenable for text-mining strategies.</p>
</sec>
<sec>
<title>Outcomes of the BioCreative challenge for PPIs</title>
<p>The task of extracting PPI data was introduced in the second edition of BioCreative (
<xref ref-type="bibr" rid="bas017-B45">45</xref>
). Several subtasks were defined: detecting the literature containing protein interaction data (Interaction Article Subtask, IAS), identifying the interaction pairs and linking the interacting partners to UniProtKB/Swiss-Prot identifiers (Interaction Pair Subtask, IPS), identifying the experimental methods employed to detect the interaction (Interaction Method Subtask, IMS) and retrieving the textual evidence of the interaction (Interaction Sentences Subtask, ISS). The PPI task was a collaborative effort with IntAct and MINT, databases whose curators annotated the training and test sets used in the various tasks (
<xref ref-type="bibr" rid="bas017-B46">46</xref>
).</p>
<p>The experimental methods are important to infer how likely it is that a given protein interaction actually occurs
<italic>in vivo</italic>
, and it is usually the cumulative evidence rather than a single experiment that defines the reliability of the interaction. At a practical level, for curators, it is fundamental to identify in the article if there are experimental techniques usually associated with the detection of protein interactions (e.g. two hybrid, affinity purification technologies). These facts motivated the introduction of the IMS (
<xref ref-type="bibr" rid="bas017-B45">45</xref>
).</p>
<p>For the IMS subtask, the two participating teams were asked to identify from the text the list of the experimental techniques employed for the detection of PPIs, and their results were compared with a reference list generated by manual annotation. The experimental interaction detection techniques allowed for this task consisted of a sub-graph specified in the PSI-MI ontology. The highest score for exact match precision was 48%, but if matching to parent terms in the ontology was allowed, the score raised to an encouraging 65% (
<xref ref-type="bibr" rid="bas017-B45">45</xref>
). This improved performance was obtained by considering as correct those predicted terms that, when compared to the manually annotated terms, were either an exact match or a direct parent concept based on the PSI-MI ontology graph structure.</p>
<p>This result is due to the fact that some ontology terms are far too specific to match the vocabulary routinely used in the biomedical literature. For instance, while‘ coimmunoprecipitation’ (MI:0019) is widely used in the scientific literature, its child terms ‘anti bait coimmunoprecipitation’ (MI:0006) and ‘anti tag coimmunoprecipitation’ (MI:0007) are not. The two child terms are used for annotation by database curators to further indicate if the experiment has been conducted with an antibody recognizing the protein or a tag fused to the target protein, respectively. The use of these terms is therefore largely limited to human curator interpretation of the literature rather than explicit text mentions of these terms.</p>
<p>Attempts that might be promising particularly for terms that are lengthy and representative of complex concepts could also consider the use of term definitions. With this respect, GO term definitions had been exploited by Piao
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="bas017-B47">47</xref>
) for identifying and analyzing relations between terms. The definitions of PSI-MI terms have also been used for linking PSI-MI terms to full-text articles by analyzing unigrams and character
<italic>n</italic>
-grams from the PSI-MI definition and synonyms (
<xref ref-type="bibr" rid="bas017-B48">48</xref>
).</p>
<p>Several studies have been published in the biomedical domain with the purpose to quantify through metrics how closely related two terms are in their meanings, i.e. their semantic similarity (
<xref ref-type="bibr" rid="bas017-B49">49</xref>
). This is an important issue not only for comparing text-mining results to manual annotations, but also for measuring consistency of manual annotations themselves in inter-annotator agreement studies or to determine the functional similarity between genes annotated with those terms. A simple approach for measuring semantic similarity can be the calculation of the distance between two terms in the graph path underlying the ontology. Semantic similarity calculations have been promising for resources like WordNet (
<xref ref-type="bibr" rid="bas017-B50">50</xref>
,
<xref ref-type="bibr" rid="bas017-B51">51</xref>
), which is essentially a lexical database of English words together with their semantic relation types with practical usage for text analysis. This resource differs therefore in scope from GO or the PSI-MI ontology, whose primary use is for annotation of gene products. Semantic similarity calculations have shown useful results to quantity functional similarity between gene products based on their GO annotations (
<xref ref-type="bibr" rid="bas017-B49">49</xref>
), but using them for directly quantifying the similarity between predicted and manually annotated terms in the context of BioCreative remained problematic.</p>
<p>The IMS task was replicated in the BioCreative III edition (
<xref ref-type="bibr" rid="bas017-B52 bas017-B53 bas017-B54">52–54</xref>
) and saw increased participation, with eight teams. The difference from the previous edition was that participants were asked to provide a list of interaction detection method identifiers for a set of full-text articles, ordered by their likelihood of having been used to detect the PPIs described in each article and providing also a text evidence passage for the interaction method.
<xref ref-type="fig" rid="bas017-F5">Figure 5</xref>
shows a set of example predictions of various degrees of difficulty corresponding to BioCreative III submissions. The training and development set were derived from annotations provided by databases compliant with the PSI-MI annotation standards, while the BioGRID and MINT database curators carefully prepared the test set. Participating teams went beyond simple term look-up and many of them considered this task as a multi-class classification problem. The best precision obtained by a submission for this task was of 80.00% at a recall of 41.50% (
<italic>F</italic>
-score of 51.508) (
<xref ref-type="bibr" rid="bas017-B53">53</xref>
). The highest
<italic>F</italic>
-score was of 55.06 (62.46% precision with 55.17% recall) (
<xref ref-type="bibr" rid="bas017-B53">53</xref>
).
<fig id="bas017-F5" position="float">
<label>Figure 5.</label>
<caption>
<p>Representative predictions submitted for the MI task of BioCreative III of diverse degrees of difficulty for automated systems. The examples correspond to submissions from various teams. Participating teams had to return the article identifier, the concept identifier for the interaction detection method according to the MI ontology, a rank, a confidence score as well as a supporting text evidence passages extracted from the full-text article. Submissions were plain text files where each field was separated using a tabulator. This figure provides colored highlights of original predictions to better grasp the output. In red, the original term from the MI ontology and its synonyms have been added to facilitate the interpretation of the results. As can be seen some cases are rather straightforward, and could be detected by direct term lookup, while others require generating lexical variants or even more sophisticated machine learning and statistical word analysis.</p>
</caption>
<graphic xlink:href="bas017f5"></graphic>
</fig>
</p>
<p>A common approach followed by participating teams was, in addition to pattern matching techniques, the use of various kinds of supervised machine learning techniques that explored a range of different features. Machine-learning methods tested included Naïve Bayes multiclass classifiers [team 65, (
<xref ref-type="bibr" rid="bas017-B55">55</xref>
)], support vector machines [SVMs; teams 81 (
<xref ref-type="bibr" rid="bas017-B56">56</xref>
) and 90 (
<xref ref-type="bibr" rid="bas017-B48">48</xref>
)], logistic regression [LR; team 69, (
<xref ref-type="bibr" rid="bas017-B53">53</xref>
)] and nearest neighbors [team 100, (
<xref ref-type="bibr" rid="bas017-B53">53</xref>
)].</p>
<p>Another common practice was based on dictionary extension approaches using manually added terms based on the training data inspection, the use of cross-ontology mapping based on Medical Subject Headings (MeSH) and Unified Medical Language System (UMLS) terms as well as rule-based expansion of the original dictionary of method terms. Most participating teams explored statistical analysis of words, bigrams and collocations present in the training and development set articles. Exact and partial word tokens building the original method term lists were also exploited too. Finally, pattern-matching techniques together with rule-based approaches combined with machine-learning classifier could be successfully adapted for this task.</p>
<p>Team 88 of BioCreative III (
<xref ref-type="bibr" rid="bas017-B53">53</xref>
) used a dictionary-based strategy to recover mentions of interaction method terms. As finding exact mentions of method terms results generally in limited recall, team 70 (
<xref ref-type="bibr" rid="bas017-B53">53</xref>
) used approximate string searches for finding method mentions. Another option to boost recall was followed by team 65 (
<xref ref-type="bibr" rid="bas017-B55">55</xref>
), which considered sub-matches at the level of words and applied pattern-matching techniques. Such methods are suitable to handle multi-term words, which comprise an important fraction of the PSI-MI terms. This team used a corpus-driven approach to derive conditional probabilities of terms and the detect (
<xref ref-type="bibr" rid="bas017-B56">56</xref>
) complemented pattern matching with a sentence classification method relying on SVMs. This type of machine learning method together with logistic regression was also tested by team 90 (
<xref ref-type="bibr" rid="bas017-B48">48</xref>
), trying out many features, like type and text of named entities, words proximity to the entities and information on where in a document these entities where mentioned. Team 69 (
<xref ref-type="bibr" rid="bas017-B53">53</xref>
) also applied logistic regression for their participating system. They included features that covered term and lexicon membership properties and carried out a global analysis at the level of the documents as well as at the level if individual sentences. A software that directly resulted from participation at the IMT is the OntoNorm framework (
<xref ref-type="bibr" rid="bas017-B57">57</xref>
) from team 89 (
<xref ref-type="bibr" rid="bas017-B58">58</xref>
) which integrated dictionary-based pattern-matching together with a binary machine-learning classification system and the calculation of mutual information and chi-squared scores of unigrams and bigrams relevant for method terms.</p>
<p>According to an observation of team 100 (
<xref ref-type="bibr" rid="bas017-B53">53</xref>
), how competitive a given strategy was depended heavily on the actual PSI-MI term. They therefore used a PSI-MI term specific knowledge-based approach, applying for instance pattern matching approached for some terms, while others were detected through a nearest neighbors method.</p>
</sec>
<sec sec-type="conclusions">
<title>Conclusions</title>
<p>The availability of text-mining tools can assist scientific curation in many ways, from the selection of the relevant literature to greatly facilitate the completion of a database entry (saving a conspicuous amount of time). Furthermore, there is a lot of ferment in the area of ontology driven annotation of biomedical literature as witnessed by the ‘Beyond the PDF’ initiative (
<xref ref-type="bibr" rid="bas017-B59">59</xref>
).</p>
<p>The whole BioCreative experience highlighted that in order to obtain substantial advances in the development of text-mining methodologies, it is necessary to develop close collaboration among different communities: text miners, database curators and ontology developers. In particular, such vicinity instilled into the text-mining community a more mature comprehension of crucial biological questions (e.g. gene species annotation) and the necessity to make methods and results more easily accessible to biologist and database annotators (e.g. user-friendly visualization tools).</p>
<p>What is crucial for text miners in the development of more efficient predictive algorithms is the availability of a large corpus of manually annotated training data. Ideally, such text-bound annotations should cover a variety of representative text phrases mapped to the same concept. How feasible it is to generate large enough annotated text data sets for complex annotation types at various levels of granularity is still unclear.</p>
<p>This necessity prompted various initiatives to compile
<italic>ad hoc</italic>
curated data sets [e.g. the GENIA corpus (
<xref ref-type="bibr" rid="bas017-B60">60</xref>
)]. Unfortunately, such collections are usually created as a specific resource for natural processing language sciences but are not suitable for all applications. Furthermore, their creation is extremely laborious resulting in relatively small collections. Another effort to provide syntactic and semantic text annotations of biomedical articles using various ontologies is the CRAFT corpus initiative, which aims to provide concept annotations from six different ontologies including GO and the Cell Type Ontology (CL) (
<xref ref-type="bibr" rid="bas017-B61">61</xref>
). One of the merits of BioCreative has been to permit the public deposition of annotated corpora. BioCreative has also been very effective in identifying the main areas of application, limitations and goals of text mining in the area of protein/gene function and interactions.</p>
<p>Data sets routinely annotated by databases are ideal candidates for the compilation of large reference data sets. Unfortunately, databases do not capture the textual passages linked to the experimental evidence and this represents a significant hurdle to the development of text-mining suites. In addition, it is still very hard to convince databases and publishers to provide access to text-bound annotations (manual text labelling), but this has also difficulties related to technical and organizational aspects.</p>
<p>In this respect, the biological ontologies may represent a powerful tool to overcome these limitations. The identification of the experimental methods (as described by PSI-MI) linked to protein interactions can be an important resource facilitating the retrieval of protein interactions, but this requires an extra effort to increase the aliases of the dictionary and/or to identify the critical textual passages.</p>
<p>Ideally, an effective strategy to effectively employ bio-ontologies in text-mining technologies would consist of an in-depth annotation of text passages associated with the ontology terms, thus creating an effective dictionary. This could serve as valuable data for machine learning approaches as well as be useful for automatic term extraction techniques to enrich iteratively the lexical resources behind the original ontologies. On the other hand, there is a need to consider more closely the use of text-mining methods for the actual development and expansion of controlled vocabularies and ontologies, relying for instance on corpus-based term acquisition. Such an approach has shown promising results for the metabolomics (
<xref ref-type="bibr" rid="bas017-B29">29</xref>
) and animal behavior (
<xref ref-type="bibr" rid="bas017-B62">62</xref>
) domains where term recognition and filtering methods using generic software tools has been explored. At the current stage, it is possible to say that the BioCreative effort has successfully promoted the exploration of a set of sophisticated methods for the automatic detection of ontology concepts in the literature, some of which can generate promising results. What is still missing is to determine more systematically which methods are more robust or competitive for particular types of concepts or terms as well as to have more granular annotations at the level of labeling textual term evidences. Ultimately, the incorporation of concept recognition systems into text-mining tools will greatly depend on their availability and flexibility to handle more customized term lists and ontology relation types.</p>
</sec>
<sec>
<title>Funding</title>
<p>This work was supported by
<funding-source>the National Center for Research Resources (NCRR)</funding-source>
and
<funding-source>the Office of Research Infrastructure Programs (ORIP) of the National Institutes of Health (NIH)</funding-source>
(
<award-id>1R01RR024031</award-id>
to M.T.) (
<award-id>R24RR032659</award-id>
to M.T.);
<funding-source>the Biotechnology and Biological Sciences Research Council</funding-source>
(
<award-id>BB/F010486/1</award-id>
to M.T.);
<funding-source>the Canadian Institutes of Health Research</funding-source>
(
<award-id>FRN 82940</award-id>
to M.T.);
<funding-source>the European Commission FP7 Program</funding-source>
(
<award-id>2007-223411</award-id>
to M.T.);
<funding-source>a Royal Society Wolfson Research Merit Award</funding-source>
(to M.T.);
<funding-source>the Scottish Universities Life Sciences Alliance</funding-source>
(to M.T.);
<funding-source>Projects BIO2007</funding-source>
(
<award-id>BIO2007-666855</award-id>
) (to M. K. and A.V.),
<funding-source>CONSOLIDER</funding-source>
(
<award-id>CSD2007-00050</award-id>
) (to M. K. and A.V.),
<funding-source>MICROME</funding-source>
(Grant Agreement Number
<award-id>222886-2</award-id>
) (to M. K. and A.V.). Funding for open access charges:
<funding-source>National Institutes of Health</funding-source>
(
<award-id>1R01RR024031</award-id>
).</p>
<p>
<italic>Conflict of interest</italic>
. None declared.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>We would like to thank Lynette Hirschman and Christian Blaschke for their active feedback in the BioCreative tasks described in this article.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="bas017-B1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Neumann</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Walter</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Heriche</surname>
<given-names>JK</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes</article-title>
<source>Nature</source>
<year>2010</year>
<volume>464</volume>
<fpage>721</fpage>
<lpage>727</lpage>
<pub-id pub-id-type="pmid">20360735</pub-id>
</element-citation>
</ref>
<ref id="bas017-B2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smogorzewska</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Desetty</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Saito</surname>
<given-names>TT</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A genetic screen identifies FAN1, a Fanconi anemia-associated nuclease necessary for DNA interstrand crosslink repair</article-title>
<source>Mol. Cell</source>
<year>2010</year>
<volume>39</volume>
<fpage>36</fpage>
<lpage>47</lpage>
<pub-id pub-id-type="pmid">20603073</pub-id>
</element-citation>
</ref>
<ref id="bas017-B3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Stamatoyannopoulos</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Dutta</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project</article-title>
<source>Nature</source>
<year>2007</year>
<volume>447</volume>
<fpage>799</fpage>
<lpage>816</lpage>
<pub-id pub-id-type="pmid">17571346</pub-id>
</element-citation>
</ref>
<ref id="bas017-B4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Seringhaus</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gerstein</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Manually structured digital abstracts: a scaffold for automatic text mining</article-title>
<source>FEBS Lett.</source>
<year>2008</year>
<volume>582</volume>
<fpage>1170</fpage>
<pub-id pub-id-type="pmid">18328823</pub-id>
</element-citation>
</ref>
<ref id="bas017-B5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Galperin</surname>
<given-names>MY</given-names>
</name>
<name>
<surname>Cochrane</surname>
<given-names>GR</given-names>
</name>
</person-group>
<article-title>The 2011 Nucleic acids research database issue and the online molecular biology database collection</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>D1</fpage>
<lpage>D6</lpage>
<pub-id pub-id-type="pmid">21177655</pub-id>
</element-citation>
</ref>
<ref id="bas017-B6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stein</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Genome annotation: from sequence to biology</article-title>
<source>Nat. Rev. Genet.</source>
<year>2001</year>
<volume>2</volume>
<fpage>493</fpage>
<lpage>503</lpage>
<pub-id pub-id-type="pmid">11433356</pub-id>
</element-citation>
</ref>
<ref id="bas017-B7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elsik</surname>
<given-names>CG</given-names>
</name>
<name>
<surname>Worley</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Community annotation: procedures, protocols, and supporting tools</article-title>
<source>Genome Res.</source>
<year>2006</year>
<volume>16</volume>
<fpage>1329</fpage>
<lpage>1333</lpage>
<pub-id pub-id-type="pmid">17065605</pub-id>
</element-citation>
</ref>
<ref id="bas017-B8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huss</surname>
<given-names>JW</given-names>
<suffix>III</suffix>
</name>
<name>
<surname>Lindenbaum</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Martone</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The Gene Wiki: community intelligence applied to human gene annotation</article-title>
<source>
<italic>Nucleic Acids Res</italic>
.</source>
<year>2010</year>
<volume>38</volume>
<fpage>D633</fpage>
<lpage>D639</lpage>
<pub-id pub-id-type="pmid">19755503</pub-id>
</element-citation>
</ref>
<ref id="bas017-B9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Leitner</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Chatr-aryamontri</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mardis</surname>
<given-names>SA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The FEBS Letters/BioCreative II.5 experiment: making biological information accessible</article-title>
<source>Nat. Biotechnol.</source>
<year>2010</year>
<volume>28</volume>
<fpage>897</fpage>
<lpage>899</lpage>
<pub-id pub-id-type="pmid">20829821</pub-id>
</element-citation>
</ref>
<ref id="bas017-B10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Superti-Furga</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Wieland</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Cesareni</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Finally: the digital, democratic age of scientific abstracts</article-title>
<source>FEBS Lett.</source>
<year>2008</year>
<volume>582</volume>
<fpage>1169</fpage>
<pub-id pub-id-type="pmid">18328821</pub-id>
</element-citation>
</ref>
<ref id="bas017-B11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Baumgartner</surname>
<given-names>WA</given-names>
<suffix>Jr</suffix>
</name>
<name>
<surname>Cohen</surname>
<given-names>KB</given-names>
</name>
<name>
<surname>Fox</surname>
<given-names>LM</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Manual curation is not sufficient for annotation of genomic databases</article-title>
<source>Bioinformatics</source>
<year>2007</year>
<volume>23</volume>
<fpage>i41</fpage>
<lpage>i48</lpage>
<pub-id pub-id-type="pmid">17646325</pub-id>
</element-citation>
</ref>
<ref id="bas017-B12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rebholz-Schuhmann</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kirsch</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Arregui</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Protein annotation by EBIMed</article-title>
<source>Nat. Biotechnol.</source>
<year>2006</year>
<volume>24</volume>
<fpage>902</fpage>
<lpage>903</lpage>
<pub-id pub-id-type="pmid">16900125</pub-id>
</element-citation>
</ref>
<ref id="bas017-B13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Couto</surname>
<given-names>FM</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>V</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GOAnnotator: linking protein GO annotations to evidence text</article-title>
<source>J. Biomed. Discov. Collab.</source>
<year>2006</year>
<volume>1</volume>
<fpage>19</fpage>
<pub-id pub-id-type="pmid">17181854</pub-id>
</element-citation>
</ref>
<ref id="bas017-B14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dowell</surname>
<given-names>KG</given-names>
</name>
<name>
<surname>McAndrews-Hill</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Hill</surname>
<given-names>DP</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Integrating text mining into the MGI biocuration workflow</article-title>
<source>Database</source>
<year>2009</year>
<comment>Vol. 2009, Article ID bap019, doi:10.1093/database/bap019</comment>
</element-citation>
</ref>
<ref id="bas017-B15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wiegers</surname>
<given-names>TC</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>AP</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>KB</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD)</article-title>
<source>BMC Bioinformatics</source>
<year>2009</year>
<volume>10</volume>
<fpage>326</fpage>
<pub-id pub-id-type="pmid">19814812</pub-id>
</element-citation>
</ref>
<ref id="bas017-B16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alterovitz</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Xiang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hill</surname>
<given-names>DP</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Ontology engineering</article-title>
<source>Nat. Biotechnol.</source>
<year>2010</year>
<volume>28</volume>
<fpage>128</fpage>
<lpage>130</lpage>
<pub-id pub-id-type="pmid">20139945</pub-id>
</element-citation>
</ref>
<ref id="bas017-B17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hirschman</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Yeh</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Blaschke</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Overview of BioCreAtIvE: critical assessment of information extraction for biology</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<issue>Suppl 1</issue>
<fpage>S1</fpage>
<pub-id pub-id-type="pmid">15960821</pub-id>
</element-citation>
</ref>
<ref id="bas017-B18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Leitner</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Mardis</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Krallinger</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>An Overview of BioCreative II.5</article-title>
<source>IEEE/ACM Trans. Comput. Biol. Bioinform.</source>
<year>2010</year>
<volume>7</volume>
<fpage>385</fpage>
<lpage>399</lpage>
<pub-id pub-id-type="pmid">20704011</pub-id>
</element-citation>
</ref>
<ref id="bas017-B19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aranda</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Achuthan</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Alam-Faruque</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The IntAct molecular interaction database in 2010</article-title>
<source>Nucleic Acids Res.</source>
<year>2010</year>
<volume>38</volume>
<fpage>D525</fpage>
<lpage>D531</lpage>
<pub-id pub-id-type="pmid">19850723</pub-id>
</element-citation>
</ref>
<ref id="bas017-B20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ceol</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Chatr-Aryamontri</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Licata</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>MINT, the molecular interaction database: 2009 update</article-title>
<source>Nucleic Acids Res.</source>
<year>2010</year>
<volume>38</volume>
<fpage>D532</fpage>
<lpage>D539</lpage>
<pub-id pub-id-type="pmid">19897547</pub-id>
</element-citation>
</ref>
<ref id="bas017-B21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Salwinski</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>CS</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>AJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The database of interacting proteins: 2004 update</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>D449</fpage>
<lpage>D451</lpage>
<pub-id pub-id-type="pmid">14681454</pub-id>
</element-citation>
</ref>
<ref id="bas017-B22">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stark</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Breitkreutz</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Chatr-Aryamontri</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The BioGRID interaction database: 2011 update</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>D698</fpage>
<lpage>D704</lpage>
<pub-id pub-id-type="pmid">21071413</pub-id>
</element-citation>
</ref>
<ref id="bas017-B23">
<label>23</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mewes</surname>
<given-names>HW</given-names>
</name>
<name>
<surname>Ruepp</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Theis</surname>
<given-names>F</given-names>
</name>
<etal></etal>
</person-group>
<article-title>MIPS: curated databases and comprehensive secondary data resources in 2010</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>D220</fpage>
<lpage>D224</lpage>
<pub-id pub-id-type="pmid">21109531</pub-id>
</element-citation>
</ref>
<ref id="bas017-B24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chautard</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Fatoux-Ardore</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ballut</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>MatrixDB, the extracellular matrix interaction database</article-title>
<source>
<italic>Nucleic Acids Res</italic>
.</source>
<year>2011</year>
<volume>39</volume>
<fpage>D235</fpage>
<lpage>D240</lpage>
<pub-id pub-id-type="pmid">20852260</pub-id>
</element-citation>
</ref>
<ref id="bas017-B25">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goll</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rajagopala</surname>
<given-names>SV</given-names>
</name>
<name>
<surname>Shiau</surname>
<given-names>SC</given-names>
</name>
<etal></etal>
</person-group>
<article-title>MPIDB: the microbial protein interaction database</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<fpage>1743</fpage>
<lpage>1744</lpage>
<pub-id pub-id-type="pmid">18556668</pub-id>
</element-citation>
</ref>
<ref id="bas017-B26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kerrien</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Orchard</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Montecchi-Palazzi</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Broadening the horizon–level 2.5 of the HUPO-PSI format for molecular interactions</article-title>
<source>BMC Biol.</source>
<year>2007</year>
<volume>5</volume>
<fpage>44</fpage>
<pub-id pub-id-type="pmid">17925023</pub-id>
</element-citation>
</ref>
<ref id="bas017-B27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cote</surname>
<given-names>RG</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Apweiler</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The ontology lookup service, a lightweight cross-platform tool for controlled vocabulary queries</article-title>
<source>BMC Bioinformatics</source>
<year>2006</year>
<volume>7</volume>
<fpage>97</fpage>
<pub-id pub-id-type="pmid">16507094</pub-id>
</element-citation>
</ref>
<ref id="bas017-B28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rosse</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration</article-title>
<source>Nat. Biotechnol.</source>
<year>2007</year>
<volume>25</volume>
<fpage>1251</fpage>
<lpage>1255</lpage>
<pub-id pub-id-type="pmid">17989687</pub-id>
</element-citation>
</ref>
<ref id="bas017-B29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spasic</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Schober</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sansone</surname>
<given-names>SA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Facilitating the development of controlled vocabularies for metabolomics technologies with text mining</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<issue>Suppl. 5</issue>
<fpage>S5</fpage>
<pub-id pub-id-type="pmid">18460187</pub-id>
</element-citation>
</ref>
<ref id="bas017-B30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bodenreider</surname>
<given-names>O</given-names>
</name>
</person-group>
<article-title>Biomedical ontologies in action: role in knowledge management, data integration and decision support</article-title>
<source>Yearb. Med. Inform.</source>
<year>2008</year>
<fpage>67</fpage>
<lpage>79</lpage>
<pub-id pub-id-type="pmid">18660879</pub-id>
</element-citation>
</ref>
<ref id="bas017-B31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tirmizi</surname>
<given-names>SH</given-names>
</name>
<name>
<surname>Aitken</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Moreira</surname>
<given-names>DA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Mapping between the OBO and OWL ontology languages</article-title>
<source>J. Biomed. Semantics</source>
<year>2011</year>
<volume>2</volume>
<issue>Suppl. 1</issue>
<fpage>S3</fpage>
<pub-id pub-id-type="pmid">21388572</pub-id>
</element-citation>
</ref>
<ref id="bas017-B32">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ball</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Blake</surname>
<given-names>JA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium</article-title>
<source>Nat. Genet.</source>
<year>2000</year>
<volume>25</volume>
<fpage>25</fpage>
<lpage>29</lpage>
<pub-id pub-id-type="pmid">10802651</pub-id>
</element-citation>
</ref>
<ref id="bas017-B33">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hill</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>B</given-names>
</name>
<name>
<surname>McAndrews-Hill</surname>
<given-names>MS</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gene ontology annotations: what they mean and where they come from</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<issue>Suppl. 5</issue>
<fpage>S2</fpage>
<pub-id pub-id-type="pmid">18460184</pub-id>
</element-citation>
</ref>
<ref id="bas017-B34">
<label>34</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McCray</surname>
<given-names>AT</given-names>
</name>
<name>
<surname>Browne</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Bodenreider</surname>
<given-names>O</given-names>
</name>
</person-group>
<article-title>The lexical properties of the gene ontology</article-title>
<source>Proc. AMIA Symp.</source>
<year>2002</year>
<fpage>504</fpage>
<lpage>508</lpage>
<pub-id pub-id-type="pmid">12463875</pub-id>
</element-citation>
</ref>
<ref id="bas017-B35">
<label>35</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Beisswanger</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Poprat</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hahn</surname>
<given-names>U</given-names>
</name>
</person-group>
<article-title>Lexical properties of OBO ontology class names and synonyms</article-title>
<source>Proceedings of the Third International Symposium on Semantic Mining in Biomedicine</source>
<year>2008</year>
<publisher-loc>Turku, Finland</publisher-loc>
<fpage>13</fpage>
<lpage>20</lpage>
</element-citation>
</ref>
<ref id="bas017-B36">
<label>36</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blaschke</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Leon</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Krallinger</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Evaluation of BioCreAtIvE assessment of task 2</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<issue>Suppl. 1</issue>
<fpage>S16</fpage>
<pub-id pub-id-type="pmid">15960828</pub-id>
</element-citation>
</ref>
<ref id="bas017-B37">
<label>37</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Muller</surname>
<given-names>HM</given-names>
</name>
<name>
<surname>Kenny</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Sternberg</surname>
<given-names>PW</given-names>
</name>
</person-group>
<article-title>Textpresso: an ontology-based information retrieval and extraction system for biological literature</article-title>
<source>PLoS Biol</source>
<year>2004</year>
<volume>2</volume>
<fpage>e309</fpage>
<pub-id pub-id-type="pmid">15383839</pub-id>
</element-citation>
</ref>
<ref id="bas017-B38">
<label>38</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jonquet</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>NH</given-names>
</name>
<name>
<surname>Musen</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>The open biomedical annotator</article-title>
<source>Summit on Translat Bioinforma</source>
<year>2009</year>
<volume>2009</volume>
<fpage>56</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="pmid">21347171</pub-id>
</element-citation>
</ref>
<ref id="bas017-B39">
<label>39</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rzhetsky</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Iossifov</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Koike</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data</article-title>
<source>J. Biomed. Inform.</source>
<year>2004</year>
<volume>37</volume>
<fpage>43</fpage>
<lpage>53</lpage>
<pub-id pub-id-type="pmid">15016385</pub-id>
</element-citation>
</ref>
<ref id="bas017-B40">
<label>40</label>
<element-citation publication-type="webpage">
<collab>Domeo</collab>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://annotationframework.org/">http://annotationframework.org/</ext-link>
(14 March 2012, date last accessed)</comment>
</element-citation>
</ref>
<ref id="bas017-B41">
<label>41</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xuan</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Dai</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Mirel</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Open biomedical ontology-based Medline exploration</article-title>
<source>BMC Bioinformatics</source>
<year>2009</year>
<volume>10</volume>
<issue>Suppl. 5</issue>
<fpage>S6</fpage>
<pub-id pub-id-type="pmid">19426463</pub-id>
</element-citation>
</ref>
<ref id="bas017-B42">
<label>42</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yeh</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Morgan</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Colosimo</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>BioCreAtIvE task 1A: gene mention finding evaluation</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<issue>Suppl. 1</issue>
<fpage>S2</fpage>
<pub-id pub-id-type="pmid">15960832</pub-id>
</element-citation>
</ref>
<ref id="bas017-B43">
<label>43</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hirschman</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Colosimo</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Morgan</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Overview of BioCreAtIvE task 1B: normalized gene lists</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<issue>Suppl. 1</issue>
<fpage>S11</fpage>
<pub-id pub-id-type="pmid">15960823</pub-id>
</element-citation>
</ref>
<ref id="bas017-B44">
<label>44</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Magrane</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Consortium</surname>
<given-names>U</given-names>
</name>
</person-group>
<article-title>UniProt Knowledgebase: a hub of integrated protein data</article-title>
<source>Database</source>
<year>2011</year>
<comment>Vol. 2011, Article ID bar009, doi:10.1093/database/bar009</comment>
</element-citation>
</ref>
<ref id="bas017-B45">
<label>45</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krallinger</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Leitner</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Rodriguez-Penagos</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Overview of the protein-protein interaction annotation extraction task of BioCreative II</article-title>
<source>Genome Biol.</source>
<year>2008</year>
<volume>9</volume>
<issue>Suppl. 2</issue>
<fpage>S4</fpage>
<pub-id pub-id-type="pmid">18834495</pub-id>
</element-citation>
</ref>
<ref id="bas017-B46">
<label>46</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chatr-aryamontri</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kerrien</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Khadake</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data</article-title>
<source>Genome Biol.</source>
<year>2008</year>
<volume>9</volume>
<issue>Suppl. 2</issue>
<fpage>S5</fpage>
<pub-id pub-id-type="pmid">18834496</pub-id>
</element-citation>
</ref>
<ref id="bas017-B47">
<label>47</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Piao</surname>
<given-names>S</given-names>
</name>
<name>
<surname>McNaught</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ananiadou</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Clustering related terms with definitions</article-title>
<source>Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)</source>
<year>2008</year>
<publisher-loc>Marrakech, Morocco</publisher-loc>
<fpage>2013</fpage>
<lpage>2019</lpage>
</element-citation>
</ref>
<ref id="bas017-B48">
<label>48</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Rak</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Restificar</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<issue>Suppl. 8</issue>
<fpage>S11</fpage>
<pub-id pub-id-type="pmid">22151769</pub-id>
</element-citation>
</ref>
<ref id="bas017-B49">
<label>49</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pesquita</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Faria</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Falcao</surname>
<given-names>AO</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Semantic similarity in biomedical ontologies</article-title>
<source>PLoS Comput. Biol.</source>
<year>2009</year>
<volume>5</volume>
<fpage>e1000443</fpage>
<pub-id pub-id-type="pmid">19649320</pub-id>
</element-citation>
</ref>
<ref id="bas017-B50">
<label>50</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fellbaum</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hahn</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Towards new information resources for public health–from WordNet to MedicalWordNet</article-title>
<source>J. Biomed. Inform.</source>
<year>2006</year>
<volume>39</volume>
<fpage>321</fpage>
<lpage>332</lpage>
<pub-id pub-id-type="pmid">16298171</pub-id>
</element-citation>
</ref>
<ref id="bas017-B51">
<label>51</label>
<element-citation publication-type="other">
<comment>Resnik,P. (1995) Using information content to evaluate semantic similarity in a taxonomy. In:
<italic>Proceedings of the 14th International Joint Conference on Artificial Intelligence</italic>
. Montréal, Canada, Vol. I, pp. 448–453</comment>
</element-citation>
</ref>
<ref id="bas017-B52">
<label>52</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arighi</surname>
<given-names>CN</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Krallinger</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Overview of the BioCreative III Workshop</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<issue>Suppl. 8</issue>
<fpage>S1</fpage>
<pub-id pub-id-type="pmid">22151647</pub-id>
</element-citation>
</ref>
<ref id="bas017-B53">
<label>53</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krallinger</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Vazquez</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Leitner</surname>
<given-names>F</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The protein-protein interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<issue>Suppl. 8</issue>
<fpage>S3</fpage>
<pub-id pub-id-type="pmid">22151929</pub-id>
</element-citation>
</ref>
<ref id="bas017-B54">
<label>54</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chatr-Aryamontri</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Winter</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Perfetto</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<issue>Suppl. 8</issue>
<fpage>S8</fpage>
<pub-id pub-id-type="pmid">22151178</pub-id>
</element-citation>
</ref>
<ref id="bas017-B55">
<label>55</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schneider</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Clematide</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Rinaldi</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Detection of interaction articles and experimental methods in biomedical literature</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<issue>Suppl. 8</issue>
<fpage>S13</fpage>
<pub-id pub-id-type="pmid">22151872</pub-id>
</element-citation>
</ref>
<ref id="bas017-B56">
<label>56</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lourenco</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Conover</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A linear classifier based on entity recognition tools and a statistical approach to method extraction in the protein-protein interaction literature</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<issue>Suppl. 8</issue>
<fpage>S12</fpage>
<pub-id pub-id-type="pmid">22151823</pub-id>
</element-citation>
</ref>
<ref id="bas017-B57">
<label>57</label>
<element-citation publication-type="webpage">
<collab>Onto Norm framework</collab>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://sourceforge.net/projects/ontonorm">https://sourceforge.net/projects/ontonorm</ext-link>
(14 March 2012, date last accessed)</comment>
</element-citation>
</ref>
<ref id="bas017-B58">
<label>58</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Agarwal</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<issue>Suppl. 8</issue>
<fpage>S10</fpage>
<pub-id pub-id-type="pmid">22151701</pub-id>
</element-citation>
</ref>
<ref id="bas017-B59">
<label>59</label>
<element-citation publication-type="webpage">
<comment>Beyond the PDF.
<ext-link ext-link-type="uri" xlink:href="http://sites.google.com/site/beyondthepdf/">http://sites.google.com/site/beyondthepdf/</ext-link>
(14 March 2012, date last accessed)</comment>
</element-citation>
</ref>
<ref id="bas017-B60">
<label>60</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Ohta</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Tateisi</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GENIA corpus–semantically annotated corpus for bio-textmining</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<issue>Suppl. 1</issue>
<fpage>i180</fpage>
<lpage>i182</lpage>
<pub-id pub-id-type="pmid">12855455</pub-id>
</element-citation>
</ref>
<ref id="bas017-B61">
<label>61</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Bada</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hunter</surname>
<given-names>LE</given-names>
</name>
<name>
<surname>Eckert</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>An overview of the CRAFT concept annotation guidelines</article-title>
<source>Proceedings of the Fourth Linguistic Annotation Workshop</source>
<year>2010</year>
<publisher-loc>Uppsala Sweden</publisher-loc>
<fpage>207</fpage>
<lpage>211</lpage>
</element-citation>
</ref>
<ref id="bas017-B62">
<label>62</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brewster</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jupp</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Luciano</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Issues in learning an ontology from text</article-title>
<source>BMC Bioinformatics</source>
<year>2009</year>
<volume>10</volume>
<issue>Suppl. 5</issue>
<fpage>S1</fpage>
<pub-id pub-id-type="pmid">19426458</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000183  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000183  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Informatique
   |area=    SgmlV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jul 1 14:26:08 2019. Site generation: Wed Apr 28 21:40:44 2021