Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Ontologies for Bioinformatics

Identifieur interne : 000696 ( Pmc/Corpus ); précédent : 000695; suivant : 000697

Ontologies for Bioinformatics

Auteurs : Nadine Schuurman ; Agnieszka Leszczynski

Source :

RBID : PMC:2735951

Abstract

The past twenty years have witnessed an explosion of biological data in diverse database formats governed by heterogeneous infrastructures. Not only are semantics (attribute terms) different in meaning across databases, but their organization varies widely. Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration. The purpose of interoperability (or sharing between divergent storage and semantic protocols) is to allow scientists from around the world to share and communicate with each other. This paper describes the rapid accumulation of biological data, its various organizational structures, and the role that ontologies play in interoperability.


Url:
PubMed: 19812775
PubMed Central: 2735951

Links to Exploration step

PMC:2735951

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Ontologies for Bioinformatics</title>
<author>
<name sortKey="Schuurman, Nadine" sort="Schuurman, Nadine" uniqKey="Schuurman N" first="Nadine" last="Schuurman">Nadine Schuurman</name>
</author>
<author>
<name sortKey="Leszczynski, Agnieszka" sort="Leszczynski, Agnieszka" uniqKey="Leszczynski A" first="Agnieszka" last="Leszczynski">Agnieszka Leszczynski</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">19812775</idno>
<idno type="pmc">2735951</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735951</idno>
<idno type="RBID">PMC:2735951</idno>
<date when="2008">2008</date>
<idno type="wicri:Area/Pmc/Corpus">000696</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Ontologies for Bioinformatics</title>
<author>
<name sortKey="Schuurman, Nadine" sort="Schuurman, Nadine" uniqKey="Schuurman N" first="Nadine" last="Schuurman">Nadine Schuurman</name>
</author>
<author>
<name sortKey="Leszczynski, Agnieszka" sort="Leszczynski, Agnieszka" uniqKey="Leszczynski A" first="Agnieszka" last="Leszczynski">Agnieszka Leszczynski</name>
</author>
</analytic>
<series>
<title level="j">Bioinformatics and Biology Insights</title>
<idno type="eISSN">1177-9322</idno>
<imprint>
<date when="2008">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The past twenty years have witnessed an explosion of biological data in diverse database formats governed by heterogeneous infrastructures. Not only are semantics (attribute terms) different in meaning across databases, but their organization varies widely. Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration. The purpose of interoperability (or sharing between divergent storage and semantic protocols) is to allow scientists from around the world to share and communicate with each other. This paper describes the rapid accumulation of biological data, its various organizational structures, and the role that ontologies play in interoperability.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Agarwal, P" uniqKey="Agarwal P">P Agarwal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ahlqvist, O" uniqKey="Ahlqvist O">O Ahlqvist</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ahlqvist, O" uniqKey="Ahlqvist O">O Ahlqvist</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aranguren, Me" uniqKey="Aranguren M">ME Aranguren</name>
</author>
<author>
<name sortKey="Bechhofer, S" uniqKey="Bechhofer S">S Bechhofer</name>
</author>
<author>
<name sortKey="Lord, P" uniqKey="Lord P">P Lord</name>
</author>
<author>
<name sortKey="Sattler, U" uniqKey="Sattler U">U Sattler</name>
</author>
<author>
<name sortKey="Stevens, R" uniqKey="Stevens R">R Stevens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author>
<name sortKey="Ball, Ca" uniqKey="Ball C">CA Ball</name>
</author>
<author>
<name sortKey="Blake, Ja" uniqKey="Blake J">JA Blake</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baader, F" uniqKey="Baader F">F Baader</name>
</author>
<author>
<name sortKey="Horrocks, I" uniqKey="Horrocks I">I Horrocks</name>
</author>
<author>
<name sortKey="Sattler, U" uniqKey="Sattler U">U Sattler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berman, Jj" uniqKey="Berman J">JJ Berman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berman, Jj" uniqKey="Berman J">JJ Berman</name>
</author>
<author>
<name sortKey="Bhatia, K" uniqKey="Bhatia K">K Bhatia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bisby, Fa" uniqKey="Bisby F">FA Bisby</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blackmore, S" uniqKey="Blackmore S">S Blackmore</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blake, J" uniqKey="Blake J">J Blake</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blake, Ja" uniqKey="Blake J">JA Blake</name>
</author>
<author>
<name sortKey="Bult, Cj" uniqKey="Bult C">CJ Bult</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boguski, Ms" uniqKey="Boguski M">MS Boguski</name>
</author>
<author>
<name sortKey="Mcintosh, Mw" uniqKey="Mcintosh M">MW McIntosh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bowker, Gc" uniqKey="Bowker G">GC Bowker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Buckingham, S" uniqKey="Buckingham S">S Buckingham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Buckingham, S" uniqKey="Buckingham S">S Buckingham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Buckingham, S" uniqKey="Buckingham S">S Buckingham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Buetow, Kh" uniqKey="Buetow K">KH Buetow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Camon, E" uniqKey="Camon E">E Camon</name>
</author>
<author>
<name sortKey="Magrane, M" uniqKey="Magrane M">M Magrane</name>
</author>
<author>
<name sortKey="Barrell, D" uniqKey="Barrell D">D Barrell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Carroll, S" uniqKey="Carroll S">S Carroll</name>
</author>
<author>
<name sortKey="Pavlovic, V" uniqKey="Pavlovic V">V Pavlovic</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Castro, Ag" uniqKey="Castro A">AG Castro</name>
</author>
<author>
<name sortKey="Rocca Serra, P" uniqKey="Rocca Serra P">P Rocca-Serra</name>
</author>
<author>
<name sortKey="Stevens, R" uniqKey="Stevens R">R Stevens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chicurel, M" uniqKey="Chicurel M">M Chicurel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chicurel, M" uniqKey="Chicurel M">M Chicurel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Choi, N" uniqKey="Choi N">N Choi</name>
</author>
<author>
<name sortKey="Song, Iy" uniqKey="Song I">IY Song</name>
</author>
<author>
<name sortKey="Hyoil, H" uniqKey="Hyoil H">H Hyoil</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Galperin, My" uniqKey="Galperin M">MY Galperin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gardner, Sp" uniqKey="Gardner S">SP Gardner</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Giles, J" uniqKey="Giles J">J Giles</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hill, Dp" uniqKey="Hill D">DP Hill</name>
</author>
<author>
<name sortKey="Blake, Ja" uniqKey="Blake J">JA Blake</name>
</author>
<author>
<name sortKey="Richardson, Je" uniqKey="Richardson J">JE Richardson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kohler, J" uniqKey="Kohler J">J Kohler</name>
</author>
<author>
<name sortKey="Philippi, S" uniqKey="Philippi S">S Philippi</name>
</author>
<author>
<name sortKey="Lange, M" uniqKey="Lange M">M Lange</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kohler, J" uniqKey="Kohler J">J Kohler</name>
</author>
<author>
<name sortKey="Schulze Kremer, S" uniqKey="Schulze Kremer S">S Schulze-Kremer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lewis, Se" uniqKey="Lewis S">SE Lewis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lord, Pw" uniqKey="Lord P">PW Lord</name>
</author>
<author>
<name sortKey="Stevens, Rd" uniqKey="Stevens R">RD Stevens</name>
</author>
<author>
<name sortKey="Brass, A" uniqKey="Brass A">A Brass</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pennisi, E" uniqKey="Pennisi E">E Pennisi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peters, B" uniqKey="Peters B">B Peters</name>
</author>
<author>
<name sortKey="Sette, A" uniqKey="Sette A">A Sette</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rector, Al" uniqKey="Rector A">AL Rector</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saeys, Y" uniqKey="Saeys Y">Y Saeys</name>
</author>
<author>
<name sortKey="Rouze, P" uniqKey="Rouze P">P Rouze</name>
</author>
<author>
<name sortKey="Van De Peer, Y" uniqKey="Van De Peer Y">Y Van de Peer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sauer, U" uniqKey="Sauer U">U Sauer</name>
</author>
<author>
<name sortKey="Heinemann, M" uniqKey="Heinemann M">M Heinemann</name>
</author>
<author>
<name sortKey="Zamboni, N" uniqKey="Zamboni N">N Zamboni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schulze Kremer, S" uniqKey="Schulze Kremer S">S Schulze-Kremer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schuurman, N" uniqKey="Schuurman N">N Schuurman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schuurman, N" uniqKey="Schuurman N">N Schuurman</name>
</author>
<author>
<name sortKey="Leszczynski, A" uniqKey="Leszczynski A">A Leszczynski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schuurman, N" uniqKey="Schuurman N">N Schuurman</name>
</author>
<author>
<name sortKey="Leszczynski, A" uniqKey="Leszczynski A">A Leszczynski</name>
</author>
<author>
<name sortKey="Fiedler, R" uniqKey="Fiedler R">R Fiedler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Searls, Db" uniqKey="Searls D">DB Searls</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, B" uniqKey="Smith B">B Smith</name>
</author>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author>
<name sortKey="Rosse, C" uniqKey="Rosse C">C Rosse</name>
</author>
<author>
<name sortKey="Bard, J" uniqKey="Bard J">J Bard</name>
</author>
<author>
<name sortKey="Bug, W" uniqKey="Bug W">W Bug</name>
</author>
<author>
<name sortKey="Seusters, W" uniqKey="Seusters W">W Seusters</name>
</author>
<author>
<name sortKey="Goldberg, Lj" uniqKey="Goldberg L">LJ Goldberg</name>
</author>
<author>
<name sortKey="Eilbeck, K" uniqKey="Eilbeck K">K Eilbeck</name>
</author>
<author>
<name sortKey="Ireland, A" uniqKey="Ireland A">A Ireland</name>
</author>
<author>
<name sortKey="Mungall, Cj" uniqKey="Mungall C">CJ Mungall</name>
</author>
<author>
<name sortKey="Leontis, N" uniqKey="Leontis N">N Leontis</name>
</author>
<author>
<name sortKey="Rocca Serra, P" uniqKey="Rocca Serra P">P Rocca-Serra</name>
</author>
<author>
<name sortKey="Ruttenber, A" uniqKey="Ruttenber A">A Ruttenber</name>
</author>
<author>
<name sortKey="Sansone, S A" uniqKey="Sansone S">S-A Sansone</name>
</author>
<author>
<name sortKey="Sheuermann, Rh" uniqKey="Sheuermann R">RH Sheuermann</name>
</author>
<author>
<name sortKey="Shah, N" uniqKey="Shah N">N Shah</name>
</author>
<author>
<name sortKey="Whetzel, Pl" uniqKey="Whetzel P">PL Whetzel</name>
</author>
<author>
<name sortKey="Lewis, S" uniqKey="Lewis S">S Lewis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, B" uniqKey="Smith B">B Smith</name>
</author>
<author>
<name sortKey="Ceusters, W" uniqKey="Ceusters W">W Ceusters</name>
</author>
<author>
<name sortKey="Klagges, B" uniqKey="Klagges B">B Klagges</name>
</author>
<author>
<name sortKey="Kohler, J" uniqKey="Kohler J">J Kohler</name>
</author>
<author>
<name sortKey="Kumar, A" uniqKey="Kumar A">A Kumar</name>
</author>
<author>
<name sortKey="Lomax, J" uniqKey="Lomax J">J Lomax</name>
</author>
<author>
<name sortKey="Mungall, C" uniqKey="Mungall C">C Mungall</name>
</author>
<author>
<name sortKey="Neuhaus, F" uniqKey="Neuhaus F">F Neuhaus</name>
</author>
<author>
<name sortKey="Rector, Al" uniqKey="Rector A">AL Rector</name>
</author>
<author>
<name sortKey="Rosse, C" uniqKey="Rosse C">C Rosse</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, B" uniqKey="Smith B">B Smith</name>
</author>
<author>
<name sortKey="Williams, J" uniqKey="Williams J">J Williams</name>
</author>
<author>
<name sortKey="Schulze Kremer, S" uniqKey="Schulze Kremer S">S Schulze-Kremer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sowa, Jf" uniqKey="Sowa J">JF Sowa</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sugden, A" uniqKey="Sugden A">A Sugden</name>
</author>
<author>
<name sortKey="Pennisi, E" uniqKey="Pennisi E">E Pennisi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thomas, Ce" uniqKey="Thomas C">CE Thomas</name>
</author>
<author>
<name sortKey="Ganji, G" uniqKey="Ganji G">G Ganji</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, K" uniqKey="Wang K">K Wang</name>
</author>
<author>
<name sortKey="Tarczy Hornoch, P" uniqKey="Tarczy Hornoch P">P Tarczy-Hornoch</name>
</author>
<author>
<name sortKey="Shaker, R" uniqKey="Shaker R">R Shaker</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wolstencroft, K" uniqKey="Wolstencroft K">K Wolstencroft</name>
</author>
<author>
<name sortKey="Mcentire, R" uniqKey="Mcentire R">R McEntire</name>
</author>
<author>
<name sortKey="Stevens, R" uniqKey="Stevens R">R Stevens</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="review-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Bioinform Biol Insights</journal-id>
<journal-id journal-id-type="publisher-id">Bioinformatics and Biology Insights</journal-id>
<journal-title-group>
<journal-title>Bioinformatics and Biology Insights</journal-title>
</journal-title-group>
<issn pub-type="epub">1177-9322</issn>
<publisher>
<publisher-name>Libertas Academica</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">19812775</article-id>
<article-id pub-id-type="pmc">2735951</article-id>
<article-id pub-id-type="publisher-id">bbi-2008-187</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Review</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Ontologies for Bioinformatics</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Schuurman</surname>
<given-names>Nadine</given-names>
</name>
<xref ref-type="corresp" rid="c1-bbi-2008-187"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Leszczynski</surname>
<given-names>Agnieszka</given-names>
</name>
</contrib>
<aff id="af1-bbi-2008-187">Department of Geography, Simon Fraser University RCB 7123, 8888 University Drive, Burnaby, British Columbia, Canada, V5A 1S6</aff>
</contrib-group>
<author-notes>
<corresp id="c1-bbi-2008-187">Correspondence: Nadine Schuurman, Department of Geography, Simon Fraser University RCB 7123, 8888 University Drive, Burnaby, British Columbia, Canada, V5A 1S6. Tel: 778-782-3320; Email:
<email>nadine@sfu.ca</email>
</corresp>
</author-notes>
<pub-date pub-type="collection">
<year>2008</year>
</pub-date>
<pub-date pub-type="epub">
<day>12</day>
<month>3</month>
<year>2008</year>
</pub-date>
<volume>2</volume>
<fpage>187</fpage>
<lpage>200</lpage>
<permissions>
<copyright-statement>Copyright © 2008 The authors.</copyright-statement>
<copyright-year>2008</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</ext-link>
).</license-p>
</license>
</permissions>
<abstract>
<p>The past twenty years have witnessed an explosion of biological data in diverse database formats governed by heterogeneous infrastructures. Not only are semantics (attribute terms) different in meaning across databases, but their organization varies widely. Ontologies are a concept imported from computing science to describe different conceptual frameworks that guide the collection, organization and publication of biological data. An ontology is similar to a paradigm but has very strict implications for formatting and meaning in a computational context. The use of ontologies is a means of communicating and resolving semantic and organizational differences between biological databases in order to enhance their integration. The purpose of interoperability (or sharing between divergent storage and semantic protocols) is to allow scientists from around the world to share and communicate with each other. This paper describes the rapid accumulation of biological data, its various organizational structures, and the role that ontologies play in interoperability.</p>
</abstract>
<kwd-group>
<kwd>ontologies</kwd>
<kwd>semantics</kwd>
<kwd>biological databases</kwd>
<kwd>bioinformatics</kwd>
<kwd>Gene Ontology</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>During the 1960s, there was a simultaneous evolution of digital protein and taxonomic inventories. By the 1980s, these had matured and were institutionalized with an attendant proliferation of biological data. These datasets were, however, maintained in closely-guarded proprietary repositories or ‘silos’ with little or no communication between them (
<xref ref-type="bibr" rid="b9-bbi-2008-187">Bisby, 2000</xref>
;
<xref ref-type="bibr" rid="b13-bbi-2008-187">Boguski and McIntosh, 2003</xref>
;
<xref ref-type="bibr" rid="b37-bbi-2008-187">Pennisi, 2000</xref>
). The 1990s were marked by a shift in emphasis from accumulating vast volumes of data to reducing overlap between databases and making use of extant data across various repository locations (
<xref ref-type="bibr" rid="b13-bbi-2008-187">Boguski and McIntosh, 2003</xref>
). This process of increasing communication between databases is known as interoperability—the focus of which is to enable data sharing and comparison.</p>
<p>As the cumulative body of biological knowledge increases, generating a comprehensive and consistent account of biology hinges upon the ability of scientists to draw upon and synthesize vast datasets across distributed digital resources. The ultimate objective of biodiversity informatics is to generate a “global inventory of [all] life on Earth” (
<xref ref-type="bibr" rid="b10-bbi-2008-187">Blackmore, 2002</xref>
, p 365), and is premised on the seamless digital accumulation of distributed taxonomies. Because contemporary biological—particularly ‘omics’ and model organism—databases stress data at the molecular scale, they do not adequately represent the physiology they describe (
<xref ref-type="bibr" rid="b13-bbi-2008-187">Boguski and McIntosh, 2003</xref>
). There is thus a need to compile the cellular features of those organisms into discernible representations of those organisms themselves.</p>
<p>The rise of ‘omics’ science—genomics, proteomics, and metabolomics for the identification and prediction of genetic product components, signatures, and processes (
<xref ref-type="bibr" rid="b41-bbi-2008-187">Sauer et al. 2007</xref>
, p 550)—has contributed the molecular-level information upon which a systems view of biology is predicated. Certainly the complexity of biology resides at the level of gene products (
<xref ref-type="bibr" rid="b41-bbi-2008-187">Sauer et al. 2007</xref>
). In this way biodiversity can be understood as the compendium of the biology of organisms (
<xref ref-type="bibr" rid="b23-bbi-2008-187">Chicurel, 2002b</xref>
). In a computational environment, biodiversity is the hereditary information encapsulated within genetic products and identified via the collective of mappings of several model organism genomes. While the maturation of ‘omics’ has been facilitated in large part by the capacity to seamlessly make divergent data sources interoperable, it has presented a new set of engineering challenges. These include the need to integrate diverse and remote data sources as well as to extract knowledge from digital information post-integration (
<xref ref-type="bibr" rid="b53-bbi-2008-187">Thomas and Ganji, 2006</xref>
).</p>
<p>The paradigm shift the ‘omics’ revolution has created within biology is best exemplified by gene prediction (also known as gene finding), and functional prediction tasks. New technologies such as micro arrays generate huge and ever-changing volumes of data (
<xref ref-type="bibr" rid="b18-bbi-2008-187">Buetow, 2005</xref>
). The rapid growth of genome mapping necessitates the ability to automate gene-calling, or the identification of the individual genes of a genome. Gene finding involves algorithms for the identification of biologically functional regions—or exons—of sequences which explicitly code for proteins (
<xref ref-type="bibr" rid="b40-bbi-2008-187">Saeys et al. 2007</xref>
). These are referred to as
<italic>coding regions</italic>
. The objective of automated gene prediction is thus to determine the “coding potential” of genetic sequences (
<xref ref-type="bibr" rid="b40-bbi-2008-187">Saeys et al. 2007</xref>
, p 414). This process uses self-learning algorithms which predict unique signatures of the genetic spectrum that indicate distinct clusters of material (
<xref ref-type="bibr" rid="b13-bbi-2008-187">Boguski and McIntosh, 2003</xref>
;
<xref ref-type="bibr" rid="b23-bbi-2008-187">Chicurel, 2002b</xref>
). Where genes have been located, the biological functions of many protein sequences are as yet undetermined (
<xref ref-type="bibr" rid="b20-bbi-2008-187">Carroll and Pavlovic, 2006</xref>
). Gene finding and functional prediction go hand in hand and are rarely treated separately as researchers often desire to discern the roles of newly identified gene products (
<xref ref-type="bibr" rid="b19-bbi-2008-187">Camon et al. 2004</xref>
). The potential for predicting protein function similarly rests on its inference over incompletely annotated sequences on the basis of homologues in other species (
<xref ref-type="bibr" rid="b20-bbi-2008-187">Carroll and Pavlovic, 2006</xref>
). However, neither is an easy feat as the coding regions of eukaryotic organisms are both sparse and small, making the identification of exon/intron boundaries—and thereby the identification of protein function—difficult, resulting in erroneous gene annotation (
<xref ref-type="bibr" rid="b40-bbi-2008-187">Saeys et al. 2007</xref>
). In the present era of functional genomics, knowledge production is however dependent on the ability to recover genes and proteins on the basis of their (correctly) annotated functionality, pathways, and/or protein-protein interactions (
<xref ref-type="bibr" rid="b13-bbi-2008-187">Boguski and McIntosh, 2003</xref>
). This is no trivial task; indeed it necessitates the resolution of
<italic>semantics</italic>
, or differences in meaning and naming conventions between distributed data resources (
<xref ref-type="bibr" rid="b14-bbi-2008-187">Bowker, 2000</xref>
).</p>
<p>Unlike systems architectures, the integration of which constitutes an ‘IT problem’ (
<xref ref-type="bibr" rid="b46-bbi-2008-187">Searls, 2005</xref>
), data are not semantically transparent. Although a structural linkage can now be easily defined between data sources such that a user can retrieve data on the basis of standardized queries across data sources with conflicting database schemas, this does not render the results of those queries meaningful. A prime example is the notion of ‘gene’—the primitive of modern biology. While the concept of ‘gene’ is still evolving, two dominant concepts exist: the Human Genome Database defines a gene as a DNA fragment that can be interpreted as (analogous to) a protein; whereas GenBank and the Genome Sequence Database (GSDB) consider a gene to be a “ ‘region of biological interest with a name and that carries a genetic trait’ ” (
<xref ref-type="bibr" rid="b42-bbi-2008-187">Schulze-Kremer, 2002</xref>
, p 180). Two databases can be developed based on different understandings of ‘gene.’ As a result, retrieving data from semantically orthogonal databases on the basis of a ‘gene’ keyword search can initiate error propagation—in this case in the form of false analogues—in the analysis and subsequent results (
<xref ref-type="bibr" rid="b46-bbi-2008-187">Searls, 2005</xref>
). The complexity of biological terms exacerbates this problem. Even where two variables in disparate databases are semantically equivalent, their relations to other knowledge objects in the data repository may not be. This is referred to as schematic incompatibility and refers to the relative position of the term in a taxonomic hierarchy.</p>
<p>In order to accommodate both semantic and schematic differences between biological databases, ‘omics’ research requires a method of expressing the
<italic>contexts</italic>
from which biological concepts emerge—at the database level. This is because functional prediction hinges upon the identification not just of sequence homologues but similar cellular components participating in a similar biological process. The component cellular, molecular, and biological details are often located in separate data sources, a function of the narrow scope of biological information produced by any given laboratory (
<xref ref-type="bibr" rid="b33-bbi-2008-187">Lewis, 2005</xref>
). Exploiting the vast digital resources of biological data for prediction services requires that the cellular, molecular, and biological contexts of proteins be adequately encoded and furthermore machine-readable.</p>
<p>
<italic>Ontologies</italic>
—or the use of a singular taxonomic and knowledge representation schema—are a way of resolving these semantic issues between databases. The bioinformatics literature has been heavily promoting ontologies as an operational solution for biological interoperability since the turn of the millennium (
<xref ref-type="bibr" rid="b5-bbi-2008-187">Ashburner et al. 2000</xref>
, p 25–9,
<xref ref-type="bibr" rid="b11-bbi-2008-187">Blake, 2004</xref>
;
<xref ref-type="bibr" rid="b13-bbi-2008-187">Boguski and McIntosh, 2003</xref>
;
<xref ref-type="bibr" rid="b15-bbi-2008-187">Buckingham, 2004a</xref>
;
<xref ref-type="bibr" rid="b16-bbi-2008-187">Buckingham, 2004b</xref>
; Buetow, 2007;
<xref ref-type="bibr" rid="b18-bbi-2008-187">Buetow, 2005</xref>
;
<xref ref-type="bibr" rid="b19-bbi-2008-187">Camon et al. 2004</xref>
;
<xref ref-type="bibr" rid="b20-bbi-2008-187">Carroll and Pavlovic, 2006</xref>
;
<xref ref-type="bibr" rid="b21-bbi-2008-187">Castro et al. 2006</xref>
;
<xref ref-type="bibr" rid="b22-bbi-2008-187">Chicurel, 2002a</xref>
;
<xref ref-type="bibr" rid="b23-bbi-2008-187">Chicurel, 2002b</xref>
;
<xref ref-type="bibr" rid="b26-bbi-2008-187">Galperin, 2006</xref>
;
<xref ref-type="bibr" rid="b29-bbi-2008-187">Giles, 2007</xref>
;
<xref ref-type="bibr" rid="b30-bbi-2008-187">Hill et al. 2002</xref>
;
<xref ref-type="bibr" rid="b31-bbi-2008-187">Kohler et al. 2003</xref>
;
<xref ref-type="bibr" rid="b32-bbi-2008-187">Kohler and Schulze-Kremer, 2002</xref>
;
<xref ref-type="bibr" rid="b33-bbi-2008-187">Lewis, 2005</xref>
;
<xref ref-type="bibr" rid="b38-bbi-2008-187">Peters and Sette, 2007</xref>
;
<xref ref-type="bibr" rid="b42-bbi-2008-187">Schulze-Kremer, 2002</xref>
;
<xref ref-type="bibr" rid="b46-bbi-2008-187">Searls, 2005</xref>
;
<xref ref-type="bibr" rid="b56-bbi-2008-187">Wolstencroft et al. 2005</xref>
). Much of this literature assumes that the reader has a prior understanding of computing and is delivered in impenetrable technical language or emphasizes a singular aspect of ontologies in biology.</p>
<p>The power of ontologies lies in their capacity to provide context for biological semantics. This paper presents the molecular biologist—rather than the computing scientist—with a detailed, comprehensive review of ontologies in biology. We begin with a definition of formal ontology in order to clarify the role that ontologies play with respect to interoperability (or the exchange of data). We describe ontological concepts—and their role bioinformatics—using the examples of two preeminent ontological efforts in biology: the Gene Ontology (GO), which is itself part of the umbrella Open Biomedical Ontologies (OBO) initiative. Subsequently we explain how ontologies can be exploited to facilitate information sharing and data integration efforts for bioinformatics with reference to real-world, large-scale biological information portals, namely the cancer Biomedical Informatics Grid (caBIG), and WikiProteins, a proprietary knowledge commons for proteins.</p>
<p>Furthermore, we describe a methodology for using ontologies as a basis for comparing semantics across health registries in order to illustrate how medical informaticians have imposed interoperability on disjunct datasets. Once semantic and schematic heterogeneity is resolved between data-sets, we explain how ontologies can be used to facilitate knowledge creation tasks in biology, such as automating gene/protein annotation and functional prediction.</p>
<p>To provide a global overview of ontologies for biology, we also draw upon a related community of research—health/medical informatics—which uses and shares with bioinformatics a series of knowledge representation constructs for the capture of biological information. Both use genetic information in the era of “ ‘post-genome’ science” (
<xref ref-type="bibr" rid="b13-bbi-2008-187">Boguski and McIntosh, 2003</xref>
, p 233). For instance, knowledge sharing protocols developed in the field of health informatics are shared by bioinformatics researchers for resolving semantic heterogeneity in databases.
<xref ref-type="bibr" rid="b54-bbi-2008-187">Wang et al. (2005)</xref>
for example use Protégé—an open-source ontology editing and knowledge acquisition software authored by
<xref ref-type="bibr" rid="b51-bbi-2008-187">Stanford Medical Informatics (2005)</xref>
—as the knowledge representation platform for their mediation architecture. There are similarly links between bioinformatics and biodiversity communities. The tools of bioinformatics—many of which emerge from health informatics—are ideally suited to the objectives of biodiversity research, particularly conservation science (
<xref ref-type="bibr" rid="b52-bbi-2008-187">Sugden and Pennisi, 2000</xref>
). This paper nevertheless emphasizes bioinformatics.</p>
</sec>
<sec>
<title>Ontologies</title>
<p>In philosophy, ontology has traditionally been understood to be the essence of being—or what something really is (
<xref ref-type="bibr" rid="b44-bbi-2008-187">Schuurman, 2006</xref>
). In the information sciences, an ontology is a fixed universe of discourse in which each element (e.g. field name or column in a database) is precisely defined (Gruber, 1993). In addition, each possible relationship between data elements is parametized or constrained. For example, DNA may comprise chromosomes but not the reverse. In an ontology, these relationships are made explicit formally.</p>
<p>The prefix ‘formal’ refers to the property of machine-readability (
<xref ref-type="bibr" rid="b1-bbi-2008-187">Agarwal, 2005</xref>
). In other words, a
<italic>formal ontology</italic>
is a machine-readable model of the objects allowed into a formal universe and their associations or relationships between them upon which some automated reasoning tasks can be performed. In a formal environment, an ontology constitutes a surrogate of knowledge abstracted from the real world—in this case, the cumulative body of biological science—in a coded form that can be translated into a programming language (
<xref ref-type="bibr" rid="b49-bbi-2008-187">Smith et al. 2003</xref>
;
<xref ref-type="bibr" rid="b50-bbi-2008-187">Sowa, 2000</xref>
).</p>
<p>Scientific or systems ontologies contain three levels of formalization. The first is the conceptual, which is then translated into a formal model of the data elements in the ontology (e.g. proteins) and the possible relationships between them. The final stage or level is the development of code that can be run by computers (
<xref ref-type="bibr" rid="b44-bbi-2008-187">Schuurman, 2006</xref>
). Ontologies are structured much like a biological taxonomy with general concepts appearing at the top of the tree and becoming more general as one traverses down. The hierarchical schema, however, is only a ‘shell’ that can accommodate the concepts and their relations particular to a domain (
<xref ref-type="bibr" rid="b39-bbi-2008-187">Rector, 1999</xref>
;
<xref ref-type="bibr" rid="b42-bbi-2008-187">Schulze-Kremer, 2002</xref>
). It must be populated by domain knowledge expressed in a formal semantics—a computing syntax such as a markup language—that allows all entities declared into the ontology to be precisely defined and their interrelationships given strict parameters with the goal of enabling realistic biological models.</p>
<p>Formal semantics permit the distinction of concepts declared into the model (
<xref ref-type="bibr" rid="b50-bbi-2008-187">Sowa, 2000</xref>
). To satisfy the strict criteria of formal ontology building, the formal semantics used to instantiate an ontology should be premised on a formal logics particular to some logical algebra (
<xref ref-type="bibr" rid="b48-bbi-2008-187">Smith et al. 2005</xref>
)—such as description logics (DL)—which contain predetermined rules for “when two concepts are the same, when one is a kind another, or how they differ” (
<xref ref-type="bibr" rid="b39-bbi-2008-187">Rector, 1999</xref>
, p 239–52, p. 10). These rules must furthermore be expressed in some machine-readable syntax—in this case, a knowledge representation language such as the Web Ontology Language (OWL). Such rules govern the expression and processing of
<italic>relations</italic>
between concepts in the hierarchy. Relational expressions are the implementation basis for all subsequent computing and modeling tasks in a software environment.
<xref ref-type="fig" rid="f1-bbi-2008-187">Figure 1</xref>
illustrates the progress from concept to code. Formalization is the basis for the transition from a conceptual entity to a machine-readableform.</p>
<p>The ability to define relationships between concepts distinguishes formal ontologies from earlier integration and interoperability approaches. How they are expressed are detailed in the subsequent sections on the GO and OBO efforts. Relationships are an expression of the
<italic>context</italic>
—akin to usage in natural language—in which concepts are used or from which they emerge. The utility of capturing relationships between concepts is thus that they convey semantics; content semantics are expressed by identifying how concepts relate to each other in the hierarchical knowledge space. This hierarchical knowledge space is a parent-child structure that conveys the semantic granularity of the relation between any two concepts by rendering entities to be either more specific of more general than each other (
<xref ref-type="bibr" rid="b49-bbi-2008-187">Smith et al. 2003</xref>
). This formal ontological structure implies at least one kind of relation: a hyponymic (
<italic>is-a</italic>
) relationship is implied by the hierarchical nesting of terms and denoted by their position relative to each other in the family tree on the basis of their subsumption (where a concept is a subclass or member of another) and specialization (where one concept is the superclass of or contains another) (
<xref ref-type="bibr" rid="b31-bbi-2008-187">Kohler et al. 2003</xref>
). Additional relationships can be asserted between concepts as a directional association (i.e. the relationship proceeds from one concept to another). Relationships—referred to as
<italic>properties</italic>
—are akin to ‘semantic edges’ which depict the meaning of data elements by providing the
<italic>context</italic>
of their usage (where context is analogous to how concepts participate in class membership).</p>
<p>Formal ontological expressions are stated as propositional triplets consisting of
<italic>concepts</italic>
(real-world entities that populate the model), their
<italic>properties</italic>
(or relationships between said entities), and
<italic>instances</italic>
(particular occurrences of a concept; for example, a particular gene with its own unique identifier in a database) in a hierarchical model (
<xref ref-type="bibr" rid="b27-bbi-2008-187">Gardner, 2005</xref>
;
<xref ref-type="bibr" rid="b42-bbi-2008-187">Schulze-Kremer, 2002</xref>
;
<xref ref-type="bibr" rid="b50-bbi-2008-187">Sowa, 2000</xref>
). A triplet (concept + property + instance) constitutes a proposition, or “[definitive] statement about (part of) the world” (
<xref ref-type="bibr" rid="b27-bbi-2008-187">Gardner, 2005</xref>
;
<xref ref-type="bibr" rid="b42-bbi-2008-187">Schulze-Kremer, 2002</xref>
, p 187). Where an ontology is formal in the sense that it is underwritten on an axiomatic logic such as Desription Logics (DL), the axioms of the logic can be applied to impose
<italic>restrictions</italic>
that define conditions under which concepts in a domain logically participate in relationships with each other (
<xref ref-type="bibr" rid="b27-bbi-2008-187">Gardner, 2005</xref>
). For example, we can impose a cardinality restriction to specify that, following the series of generic examples provided by
<xref ref-type="bibr" rid="b4-bbi-2008-187">Aranguren et al. (2007)</xref>
, a “man” must have at least one testes.</p>
<p>In a strict definition of formal ontology, the axiomatic logics serve to underwrite a formal notation for content specification. For example, DL comprise the logical
<italic>semantics</italic>
for knowledge representation which constitute the basis of ontological encoding specifically designed for a group of knowledge representation languages which include OWL, the standard language for ontologies over the Web. The eXtensible Markup Language provides the tag-based syntax for OWL, whereas its schema is defined by the Resource Description Framework (RDF), which specifies what the ‘triplet’ structure (
<italic>concepts</italic>
+
<italic>properties</italic>
+
<italic>instances</italic>
described above) of ontological expression. A standard schema ensures that when OWL statements are
<italic>parsed</italic>
or transformed into the component data structure of the target formal ontology, the parser knows which part of the expression constitutes the concept, which section the relation, and which the instance. It is this structure which makes the grammar of an ontology
<italic>meaningful</italic>
—in the case of bioinformatics, for example, it anchors annotations to the gene products they characterize (
<xref ref-type="bibr" rid="b8-bbi-2008-187">Berman, 2005</xref>
;
<xref ref-type="bibr" rid="b7-bbi-2008-187">Berman and Bhatia, 2005</xref>
).</p>
<p>This structure moreover makes the ontological model amenable to implementation in a software environment (
<xref ref-type="bibr" rid="b49-bbi-2008-187">Smith et al. 2003</xref>
) in order to allow for the kind of intelligence described using the example of a cardinality restriction in the instance, ‘a man must have at least one tests’. The taxonomic structure of formal ontologies captured using logical notation and expressed in a knowledge representation language allows the semantics of concepts to be computed on the basis of concept inheritance. This is known as
<italic>reasoning</italic>
, where an application infers non-explicit (not directly stated) relationships between concepts (
<xref ref-type="bibr" rid="b39-bbi-2008-187">Rector, 1999</xref>
). For example, where two proteins identified using different unique identifiers in disparate databases are described as participating in the same biological function, being part of the same sequence, having the same cellular location, etc., they can be recognized as referring to the same concept and can thus be extracted from separate databases on the basis of these functional characteristics rather than nominal IDs. The ability for each term to relate to every other term in the hierarchy is a way of capturing—and expressing—the complexity of biology (
<xref ref-type="bibr" rid="b30-bbi-2008-187">Hill et al. 2002</xref>
). Reasoning can therefore be thought of as supporting both inference and query (
<xref ref-type="bibr" rid="b4-bbi-2008-187">Aranguren et al. 2007</xref>
). Inference consists of computing the hierarchy—for example, it will reveal multiple inheritance amongst classes as mentioned aboved. Query consists of the ability to interrogate the concept hierarchy on the basis of object associations or, conversely, to reveal object associations amongst selected concepts or classes. Imposing the above cardinality restriction therefore has two implications. The first is that any data object labeled or identified “man” in a data repository mapped to an ontology with the above property restriction imposed upon the man-testes relation will be recognized as a (likely person) with at least one testes. Conversely, the execution of reasoning tasks on the ontology or any data structure mapped to it will compute whether all instances of man are consistent with (a person) having at least one testes.</p>
<p>Ontologies—with their hierarchical structures—capture the semantic granularity of biological databases. The property of inheritance allows the computer to process, for example, that the concepts used to annotate two respective sequences are both ‘children’ of the same meta-concepts (i.e. they are a kind or part of a the same overarching concept; alternatively, members of the came class) (
<xref ref-type="bibr" rid="b34-bbi-2008-187">Lord et al. 2003</xref>
). This permits researchers to locate regions of exact correspondence as well as those with a high degree of similarity. Entities may relate but are not synonymous—for example, where ‘protein’ is a subclass of another concept, ‘gene products’ (
<xref ref-type="bibr" rid="b2-bbi-2008-187">Ahlqvist, 2004</xref>
;
<xref ref-type="bibr" rid="b3-bbi-2008-187">Ahlqvist, 2005</xref>
). This does not dictate that proteins and genetic products are one and the same, but rather allows the expression of a membership relation at a much finer semantic resolution such that proteins can be understood as one, but not the sole, kind of gene product (which also includes RNA).</p>
<p>Thus far, we have described the problem of semantic and schematic heterogeneity and introduced ontologies as a means of mitigating the problem. The formal implementation of ontologies—as well as necessary conditions for formality—has also been discussed as well as its advantages for promoting computer reasoning. In the next section, we describe in detail the genesis and development of a bioinformatics portals, GO and its role in biological data interoperability. In addition, we briefly illustrate the implementation of ontologies in two database as well as an ontology-based method for comparing data from different registries or jurisdictions.</p>
</sec>
<sec>
<title>GO: Ontology in Practice</title>
<p>The use of ontologies for bioinformatics is being driven by the proliferation of genome-scale data-sets and the diffusion of the Internet and its protocols for data sharing and exchange (
<xref ref-type="bibr" rid="b11-bbi-2008-187">Blake, 2004</xref>
). Bio-ontologies fulfill two central functions for the biological domain—first, they “clarify scientific discussions” by providing the vocabulary and terms under—and with which—such discussions take place, and second, they enable data discovery across distributed data resources (
<xref ref-type="bibr" rid="b11-bbi-2008-187">Blake, 2004</xref>
, p 773). The pre-eminent bio-ontology is the (GO), a Web-based, open source knowledge resource for bioinformatics and the second-most cited biological data resource after UniProt (
<xref ref-type="bibr" rid="b26-bbi-2008-187">Galperin, 2006</xref>
).</p>
<p>The GO project evolved as a joint endeavor between three model organism databases: FlyBase, Mouse Genome Informatics Database (MGI), and the
<italic>Saccharymyces</italic>
(yeast) Genome Database in 1999 (
<xref ref-type="bibr" rid="b5-bbi-2008-187">Ashburner et al. 2000</xref>
;
<xref ref-type="bibr" rid="b11-bbi-2008-187">Blake, 2004</xref>
). The formation of the Gene Ontology Consortium (GOC) coincided with the successful completion of the mappings of several eukaryotic genomes. The key to associating these model databases was the genetic structure of organisms (
<xref ref-type="bibr" rid="b33-bbi-2008-187">Lewis, 2005</xref>
). A potential problem lay in that these databases had been designed and populated with competing concepts for gene. Moreover, there was still limited understanding as to how the located genes were controlled and more importantly what functions many of these served (
<xref ref-type="bibr" rid="b33-bbi-2008-187">Lewis, 2005</xref>
). As there is a high degree of functional conservation in homologous organisms, gene function can be reasonably inferred through probable genetic orthologues (
<xref ref-type="bibr" rid="b5-bbi-2008-187">Ashburner et al. 2000</xref>
). In other words, rather than ‘reinventing the wheel’, biologists and bioinformaticians could transfer functional attributes describing the cellular behaviors of gene products between these databases thereby significantly reducing workload.</p>
<p>The chief impediment to this task were not the unique identifiers for the gene products themselves as researchers had been tapping into protein and gene databases such as GenBank and Swiss-Prot, TrEMBL and PIR for decades (the latter three joined to form the Universal Protein or UniProt protein repository in 2002). Because sequences are unique, they could be easily accessed on the basis of sequence characteristics (though there was sequence redundancy between protein repositories). Computationally, because sequences can be quantified, this is a trivial integration task that simply requires the normalization of unique codes (
<xref ref-type="bibr" rid="b44-bbi-2008-187">Schuurman et al. 2006</xref>
). Rather, it was the functional descriptions of gene products that proved challenging. Integration had to proceed within the context of the molecular and biological characteristics of each gene product identified (
<xref ref-type="bibr" rid="b33-bbi-2008-187">Lewis, 2005</xref>
). In an attempt to solve the problem, informatics experts from the three original participating model organism databases devised functional classification systems in the hopes that these precursors to the GO would facilitate interoperability. What soon became apparent, however, was that these functional classifications were not common between organisms (
<xref ref-type="bibr" rid="b33-bbi-2008-187">Lewis, 2005</xref>
).</p>
<p>In other words, the annotation was not consistent from one database to the next. Gene annotation is defined as the “task of adding layers of analysis and interpretation to … raw sequences” (2002, p 755). This includes information about their function, position relative to coding/non-coding boundaries, participating process, etc. (
<xref ref-type="bibr" rid="b23-bbi-2008-187">Chicurel, 2002b</xref>
). Annotations constitute a set of
<italic>metadata</italic>
, or ‘data about data’ Historically, annotation has been stored as free-text or at best semi-structured descriptions semantically particular to the terminological or classification systems unique to many of the databases (
<xref ref-type="bibr" rid="b30-bbi-2008-187">Hill et al. 2002</xref>
;
<xref ref-type="bibr" rid="b34-bbi-2008-187">Lord et al. 2003</xref>
). There were two challenges. First the use of competing nomenclatures precluded the linear association of database semantics. Second, the expression of these annotations in natural language provided little context for data mining because they were not machine-readable. Returning to the example of functional prediction, protein functions are inherently dependent upon context, particularly cellular context (
<xref ref-type="bibr" rid="b20-bbi-2008-187">Carroll and Pavlovic, 2006</xref>
). This is exacerbated in the case of proteins particularly as many sequences often have multiple functions (
<xref ref-type="bibr" rid="b20-bbi-2008-187">Carroll and Pavlovic, 2006</xref>
).</p>
<p>The GO Consortium formed as a response to the pervasive semantic heterogeneity of biomedical data and its lack of formality. Indeed the GO was designed for making historically free-text based annotations tractable (
<xref ref-type="bibr" rid="b34-bbi-2008-187">Lord et al. 2003</xref>
). The three participating database programs agreed to work in concert to provide the biological community with a consensus-driven framework to guide the annotation of gene products such that their structure (e.g. how molecular function is described and which part of the description occurs in what syntactic order) and semantics (the terms and concepts) are consistent. The result was the GO—a “structured, precisely defined, common, controlled vocabulary for describing the roles of genes and gene products in any organism” (
<xref ref-type="bibr" rid="b5-bbi-2008-187">Ashburner et al. 2000</xref>
, p 26). The GO is not a taxonomy or index of all known proteins and gene products, but rather provides a standardized set of names for genes and proteins and the terms for characterizing—or ‘annotating’—their behaviors (
<xref ref-type="bibr" rid="b28-bbi-2008-187">Gene Ontology Consortium 2007</xref>
).</p>
<p>Gene product semantics are organized into three categories which capture the primary ‘aspects’ of genes: i) biological process, which captures the larger process in which the gene product is active; ii) molecular function, the biochemical function a gene product contributes to that process, and iii) cellular component, the location in the cell where that particular function is fulfilled or expressed (
<xref ref-type="bibr" rid="b5-bbi-2008-187">Ashburner et al. 2000</xref>
;
<xref ref-type="bibr" rid="b28-bbi-2008-187">Gene Ontology Consortium 2007</xref>
;
<xref ref-type="bibr" rid="b30-bbi-2008-187">Hill et al. 2002</xref>
;
<xref ref-type="bibr" rid="b34-bbi-2008-187">Lord et al. 2003</xref>
). Concepts or terms constitute nodes, and vectors referred to as
<italic>edges</italic>
represent relationships between concepts (
<xref ref-type="bibr" rid="b30-bbi-2008-187">Hill et al. 2002</xref>
;
<xref ref-type="bibr" rid="b34-bbi-2008-187">Lord et al. 2003</xref>
). These three sub-ontologies are maintained independently because the one-to-many relationships between process, function and cellular location would make a singular graph representation intractable (
<xref ref-type="bibr" rid="b5-bbi-2008-187">Ashburner et al. 2000</xref>
). Annotations for the same term in each ‘view’ are cross-referenced on the basis of a unique identifier or serial number assigned to each term in the GO. Increasingly, these identifiers are being used to refer to concepts in other protein and gene-oriented databases and constitute a linear and direct means of mapping databases to the GO (
<xref ref-type="bibr" rid="b28-bbi-2008-187">Gene Ontology Consortium 2007</xref>
). A 2005 figure estimates the GO as consisting of more than 17, 500 terms distributed amongst the three subgraphs (
<xref ref-type="bibr" rid="b56-bbi-2008-187">Wolstencroft et al. 2005</xref>
). All possible annotations for a protein can be represented using these concepts (
<xref ref-type="bibr" rid="b20-bbi-2008-187">Carroll and Pavlovic, 2006</xref>
).</p>
<p>Each of these three separate annotation categories—biological process, molecular function, and cellular component—is represented as its own directed acylic graph, or DAG (
<xref ref-type="bibr" rid="b5-bbi-2008-187">Ashburner et al. 2000</xref>
;
<xref ref-type="bibr" rid="b30-bbi-2008-187">Hill et al. 2002</xref>
;
<xref ref-type="bibr" rid="b34-bbi-2008-187">Lord et al. 2003</xref>
). A DAG is a data structure similar to a tree which represents knowledge hierarchically, mirroring the taxonomic structure of biological knowledge. Any entity can point to any other entity in the mathematical space; this is, however, a direction, and non-recursive, encoding. In other words, concepts can point to other entities in the model, but those entities do not ‘point back’ as in OWL. Indeed the DAG can be considered to be the native knowledge representation (KR) language of the GO (
<xref ref-type="bibr" rid="b4-bbi-2008-187">Aranguren et al. 2007</xref>
). Unlike the KR languages introduced above, however, DAG semantics are not predicated on a formal logic as they are in the case of OWL. Rather, machine readability is instructed by the directional links between pairs of concepts. Semantic ‘edges’ (relationships) in the DAG are simply “ordered pairs of nodes” (
<xref ref-type="bibr" rid="b4-bbi-2008-187">Aranguren et al. 2007</xref>
, p 61). Pointers are like edges in the sense that their semantics are directed, and are labeled with the relationship that associates related classes. These associations are of only two relations:
<italic>is-a</italic>
, which denotes that concepts are
<italic>kinds of</italic>
entities, and
<italic>part-of</italic>
, which can signify the participation or contribution of a concept in a sequence or process (
<xref ref-type="bibr" rid="b49-bbi-2008-187">Smith et al. 2003</xref>
).</p>
<p>The DAG is available in many file formats—XML, OWL—but the most common formal notation in which GO ontologies are rendered is the Open Biomedical Ontology (OBO; described in more detail below) flat file structure which is underwritten by a modified subset of Web Ontology Language (OWL) description logics (DL) concepts for content specification (
<xref ref-type="bibr" rid="b28-bbi-2008-187">Gene Ontology Consortium 2007</xref>
).</p>
<p>Like OWL, OBO is an ontology language, and standard ‘file format’ for GO annotations. It is however less expressive than OWL. These relations are unidirectional and linear as per the DAG data model and do not require the recursive relational declarations (where the reciprocal or inverse of a relationship is also encoded) characteristic of OWL statements. Thus a flat file structure that only supports sequential reading is appropriate for the GO because relations are read from broader or general to more specific or precise concepts.</p>
<p>At the level of the database, the GO is represented as a structured vocabulary; more specifically, as gene product annotations expressed using concepts and their tripartite (biological, molecular, and cellular) structure defined in the GO (
<xref ref-type="bibr" rid="b30-bbi-2008-187">Hill et al. 2002</xref>
). The GO is not considered an informatics ontology in the full sense of the term because it has not been designed to be deployed within software environments which execute semantic inference on the basis of logical semantics (
<xref ref-type="bibr" rid="b49-bbi-2008-187">Smith et al. 2003</xref>
). Moreover it does not fulfill the conditions of formality identified by
<xref ref-type="bibr" rid="b48-bbi-2008-187">Smith et al. (2005)</xref>
. Rather it is considered and referred to by its engineers as a “controlled vocabulary” (
<xref ref-type="bibr" rid="b5-bbi-2008-187">Ashburner et al. 2000</xref>
, p 26). The nevertheless has many of the characteristics of a formal ontology: machine-readability, formal notation, a hierarchical knowledge structure, and relational associations between concepts. In other words, the GO may be considered a partial implementation that uses many concepts of formal ontology. Part of the reason, however, that the GO is only a partial implementation is that it was designed to be operational within existing infrastructures, requiring no changes to existing architectures.</p>
<p>Notwithstanding, the GO provides the standard vocabulary for semantic integration and automated tasks for bioinformatics. As such it is more than merely a sophisticated data dictionary. Whereas controlled vocabularies or data dictionaries provide a definition of the terms used by a community of practice and these may indeed be machine-readable and thereby formal, a nomenclature does not capture the hierarchical representation of knowledge nor the corresponding relations between all concepts in the data space, and thereby does not support computational reasoning (
<xref ref-type="bibr" rid="b42-bbi-2008-187">Schulze-Kremer, 2002</xref>
). Several terminological systems such as SNOMED (Systematized Nomenclature for Medicine) and MeSH (Medical Subject Headings) have, however, been mapped to the GO (
<xref ref-type="bibr" rid="b46-bbi-2008-187">Searls, 2005</xref>
).</p>
<p>The GO is a
<italic>global ontology</italic>
(
<xref ref-type="bibr" rid="b34-bbi-2008-187">Lord et al. 2003</xref>
). In other words, it is a central knowledge proxy to which other ontologies or knowledge representations may be aligned. Ontology mapping is the process of defining associations between ontologies. This involves the formal declaration of relational links between entities, much like that involved in relating concepts in a hierarchical ontological structure. Ontologies can either be
<italic>aligned</italic>
whereby the formalisms remain separate entities but are related, or
<italic>merged</italic>
wherein a singular ontology is generated from the crossproducts of two input ontologies (
<xref ref-type="bibr" rid="b24-bbi-2008-187">Choi et al. 2006</xref>
). ‘Mapping’ is thus unidirectional and always
<italic>from</italic>
the constituent database
<italic>to</italic>
the GO.
<xref ref-type="fig" rid="f2-bbi-2008-187">Figure 2</xref>
illustrates the role that GO plays in development of global biological ontology and the mechanics involved.</p>
<p>A global ontology paradigm is appropriate for the domain because there is a finite, though as yet not fully discovered or known, body of genetic information shared between all life on Earth (
<xref ref-type="bibr" rid="b5-bbi-2008-187">Ashburner et al. 2000</xref>
). Accordingly there is no need to build
<italic>local</italic>
ontologies each capturing a competing account or version of a biological universe. Such a scenario would be much more intensive, requiring the definition of linkages between each participating ontology. A global ontology serves as a
<italic>proxy context</italic>
which interfaces all participating knowledge formalisms are translated to the unique semantic points of the proxy and then compared on the basis of this translation (
<xref ref-type="bibr" rid="b3-bbi-2008-187">Ahlqvist, 2005</xref>
).</p>
<p>The alignment of currently non-compatible ontologies to the GO is one avenue for its
<italic>curation</italic>
or the process of developing and contributing content or adding value to digital knowledge representation systems such as databases or ontologies. For the GO to serve as a comprehensive knowledge resource for the biological community, it must reflect the continuously increasing body of biological, specifically genetic-level, knowledge. In other words, it must expand to keep pace with the identification of new genes, sequences, functional determinations, etc. Rather than being the responsibility of the Consortium, GO curation has been user driven from inception (
<xref ref-type="bibr" rid="b30-bbi-2008-187">Hill et al. 2002</xref>
). GO expansion efforts are supported by the scientific publication process, with several leading periodicals and sequencing initiatives mandating that newly identified sequences be deposited into GO-compliant databases and any new annotations be added to the GO (
<xref ref-type="bibr" rid="b23-bbi-2008-187">Chicurel, 2002b</xref>
;
<xref ref-type="bibr" rid="b38-bbi-2008-187">Peters and Sette, 2007</xref>
). Early curation was characteristically on a need-be basis with concepts added to the GO when authors were annotating genes, etc. (
<xref ref-type="bibr" rid="b30-bbi-2008-187">Hill et al. 2002</xref>
). Such a
<italic>d hoc</italic>
practices, however, resulted in logical problems in the DAG and indeed soon became inefficient as the scope and scale of the GO has steadily grown (
<xref ref-type="bibr" rid="b30-bbi-2008-187">Hill et al. 2002</xref>
). Increasingly, methods for contributing annotations to the GO are based on the automatic generation of annotation concept definitions on the basis of cross-products between databases (as local ontologies) and the GO itself (
<xref ref-type="bibr" rid="b30-bbi-2008-187">Hill et al. 2002</xref>
).</p>
<p>The GO was designed specifically to account for molecular function, biological process, and cellular components of gene products. It lacks the semantics to describe the physical attributes of genes, to describe a protein family, or to account for experimental processes and diagnostic procedures (
<xref ref-type="bibr" rid="b56-bbi-2008-187">Wolstencroft et al. 2005</xref>
). There are both proprietary and open ontologies with richer semantics for more specific description tasks for biology either being developed or presently available (for examples, see
<xref ref-type="bibr" rid="b22-bbi-2008-187">Chicurel, 2002</xref>
;
<xref ref-type="bibr" rid="b38-bbi-2008-187">Peters and Sette, 2007</xref>
;
<xref ref-type="bibr" rid="b42-bbi-2008-187">Schulze-Kremer, 2002</xref>
). The majority, however, are designed with mapping to the GO in mind (
<xref ref-type="bibr" rid="b16-bbi-2008-187">Buckingham, 2004b</xref>
;
<xref ref-type="bibr" rid="b23-bbi-2008-187">Chicurel, 2002b</xref>
).</p>
</sec>
<sec>
<title>Open Biomedical Ontologies</title>
<p>GO is thus not the sole ontology for biology. Indeed there is a need for ontologies to parallel the GO programme. The GO is only one—but certainly the most prominent—ontology effort which contributes to the Open Biomedical Ontologies (OBO) initiative (
<xref ref-type="bibr" rid="b28-bbi-2008-187">Gene Ontology Consortium 2007</xref>
). The OBO Foundry is an umbrella for over 60 bio-ontologies (
<xref ref-type="bibr" rid="b47-bbi-2008-187">Smith et al. 2007</xref>
,
<xref ref-type="bibr" rid="b36-bbi-2008-187">The Open Biomedical Ontologies 2007</xref>
). It provides guidelines for ontology development, and indeed ontologies such as GO have been restructured in line with OBO specifications. As indicated above in the detailed discussion of GO, OBO is also its own ontology format (although OBO does provide an extensive suite of translation schema for mapping OBO representations to, for example, OWL) (
<xref ref-type="bibr" rid="b36-bbi-2008-187">The Open Biomedical Ontologies 2007</xref>
). The benefit of this is that, given domain consensus, it provides for uniform representation and thereby increased interoperability. For example, disparate cell-type ontologies including the GO are now integrated into a single ontology that is itself being aligned to a singular implementation. OBO participates in the National Center for Biomedical Ontology and is slated to become a centralized resource of its emergent BioPortal in support of bioinformatics knowledge discovery and sharing.</p>
</sec>
<sec>
<title>Ontologies in Support of Bioinformatics</title>
<p>The largest public contributor of annotations to the GO project is the Gene Ontology Annotation Database (GOA) (
<xref ref-type="bibr" rid="b19-bbi-2008-187">Camon et al. 2004</xref>
). While annotation is the central organizing principle and
<italic>raison d’etre</italic>
of the GO, the potential of their ontological encoding is not to have a hierarchically structured record of concepts used to annotate the data of biology, but rather to exploit the ontology for a series of bioinformatics services which remove the burden of data-intensive tasks from molecular biologists and moreover
<italic>produce</italic>
knowledge over and above facilitating its reuse.</p>
<p>One of the primary objectives for bioinformatics to realize is the automation of annotating cross-matches between databases (
<xref ref-type="bibr" rid="b5-bbi-2008-187">Ashburner et al. 2000</xref>
). The electronic generation of annotations based on homology is particularly desirable as the manual curation of gene-oriented databases is time consuming and non-trivial for humans (
<xref ref-type="bibr" rid="b23-bbi-2008-187">Chicurel, 2002b</xref>
). The GO facilitates the automatic annotation of gene products at the database level. GOA for instance uses GO terms to generate annotations for the UniProt Knowledgebase (The consortium of SwissProt, TrEMBL, and PIR-PSD protein databases) (
<xref ref-type="bibr" rid="b19-bbi-2008-187">Camon et al. 2004</xref>
). Existing data held in UniProt are electronically associated with or translated into GO terms on the basis of a defined mapping file used to facilitate the conversion of keywords in the constituent databases to tractable GO representations (
<xref ref-type="bibr" rid="b19-bbi-2008-187">Camon et al. 2004</xref>
). Once the semantics are consistent between data sources, biologists who have identified a new sequence, for example, can navigate the GO via an interface known as an
<italic>ontology browser</italic>
on the basis of these common data elements and indeed use the existing GO annotations to not only discover sequence similarity but to also automatically populate or their own database using the existing annotations for homologues from other curated data sources. Thus the ontology functions as a ‘translation schema’ (
<xref ref-type="bibr" rid="b18-bbi-2008-187">Buetow, 2005</xref>
). This is possible because GO is underwritten by a structured grammatical framework (e.g. RDF) that predetermines the occurrence or sequence of description types in a proposition, allows the expression to be parsed and correctly broken-down such that it can be stored according to the structure of the target database.</p>
<p>The GO can be used to automate the following services: database annotation, GO extension (automating the transfer of new annotation concepts
<italic>to</italic>
the GO), prediction services, and database population. Prediction services supported by ontologies yield new biological knowledge. Gene location using current generation algorithms uses data from a pair of genomes to locate areas of genetic affinity; these areas of ‘overlap’ are often the sites of new genes (
<xref ref-type="bibr" rid="b23-bbi-2008-187">Chicurel, 2002b</xref>
). The success of this is based on the semantic consistency of annotations for the input genomes. In addition, the GO supports nuanced data exploration and query (
<xref ref-type="bibr" rid="b11-bbi-2008-187">Blake, 2004</xref>
). The hierarchical structure of knowledge afforded by ontology allows the isolation of the appropriate concept for query on the basis of its context or position relative to other entities in the data space (
<xref ref-type="bibr" rid="b16-bbi-2008-187">Buckingham, 2004b</xref>
). This allows users to formulate searches using conventional keywords, but resolves the meanings of those keywords.</p>
<p>Once the protein or gene of interest is isolated, its location confers more information than a binary indication of its absence or presence in a database. Not only do we know about the occurrence of a protein, for example, but we are told something
<italic>about</italic>
it. The proprietary EnsemblGO Browser is an interface which compiles annotations to generate reports or summaries centered on the biological entities isolated in the GO such that, for instance, “the previously unconnected classes Antigen, Immunogen, and Adjuvant are now recognized as being objects (for example, Proteins), which participate in a certain role (as Immunogens) in a specific process (such as Immunization)” (
<xref ref-type="bibr" rid="b16-bbi-2008-187">Buckingham, 2004b</xref>
;
<xref ref-type="bibr" rid="b38-bbi-2008-187">Peters and Sette, 2007</xref>
, p 489).</p>
<p>Ontologies can further be used as a basis for exploring datasets. We have devised a methodology called
<italic>ontology-based metadata</italic>
which uses ontologies as a component in a metadata-based framework for the comparison of a series of eight ‘near’ but non-equivalent terms that have been identified as an obstacle to integrating perinatal (pregnancy and antepartum) health data registries across Canada. Our objective is to provide health researchers and data stewards with a basis for drawing meaningful parallels between data elements to enable the legitimate integration of peri-natal data registries. Ontology-based metadata for each term is first collected via a series of electronic forms which standardize the description of each concept. Each constituent database is responsible for detailing how these terms are
<italic>used</italic>
in their particular jurisdiction—or context. This includes a specification of the classification standard used (e.g. ICD-10), the identification of thresholds for measurement specifications, and space for free-text descriptions of any policy constraints which may influence how the term is used in a given jurisdiction. In addition to these ‘annotations’, we encode each perinatal database as a formal ontology in OWL. These ontologies capture the semantic structure of database terms. These are then merged into a single ontology, with the relationships between each and every concept defined in the product tree. Both the ontology-based metadata and the ontologies are inputs to a semantic data discovery portal where researchers specify which terms in two respective databases are to be compared via a graphical user interface (GUI). The application returns to the user both the encoded relationship between the concepts extracted from the OWL code—for example, where
<italic>pregnancy-induced hypertension</italic>
is a
<italic>KIND-OF hypertension complicating pregnancy</italic>
—and the ontology-based metadata for each term in the selected databases. Thus the researcher is provided with both a marker for the granularity of the semantic relationship between two concepts, as well as valuable metadata which are used to inform perinatal database decisions.</p>
<p>The gestational hypertension/hypertension example above would indicate that hypertension experienced during pregnancy is a more general concept which includes gestational hypertension but also encompass pre-existing hypertension. In some databases, hypertension and pregnancy-induced or gestational hypertension are not differentiated from chronic or pre-existing incidences of disease. Alternatively, in other databases, these concepts are distinguished from each other on the basis of the periodicity of disease onset such that chronic hypertension and pregnancy-induced or gestational hypertension are disjoint (database
<italic>A</italic>
). In yet other registries, any form of hypertension presenting during pregnancy is considered gestational such that a pre-existing condition which first manifests itself during pregnancy is still encoded as pregnancy-related (database
<italic>B</italic>
). There is thus a semantic incommensurability between what ‘gestational hypertension’ represents in databases
<italic>A</italic>
and
<italic>B</italic>
, precluding a direct mapping between these concepts indicating semantic equivalence. Rather, ‘gestational hypertension’ in database
<italic>A</italic>
would be a
<italic>kind of</italic>
gestational hypertension as the concept is reified in database
<italic>B</italic>
. If a researcher were to query ‘gestational hypertension’ across both databases, she would logically accept them as referring to the same concept on the basis of lexical coincidence. However, the lack of an encoded equivalence between these two concepts would preclude their conflation. Thus our this approach not only provides information regarding how concepts should be associated, but also uses formal ontologies to restrict
<italic>which</italic>
concepts may be legitimately compared. This nesting of relationships between semantic terms is described in
<xref ref-type="fig" rid="f3-bbi-2008-187">Figure 3</xref>
.</p>
<p>Another instantiation of the ontology-based metadata concept similar to our implementation is WikiProteins, a structured semantic space for capturing the context—biological, physiological, chemical, etc.—of proteins and then sharing that collaborative knowledge with other biologists in real-time (
<xref ref-type="bibr" rid="b29-bbi-2008-187">Giles, 2007</xref>
;
<xref ref-type="bibr" rid="b55-bbi-2008-187">Wiki For Professionals 2007</xref>
). Historically, the problem with metadata has been that it is so labor intensive and never updated (
<xref ref-type="bibr" rid="b44-bbi-2008-187">Schuurman and Leszczynski, 2006</xref>
). WikiProteins provides a mechanism for sharing the labor and ongoing maintenance by participants. This collaborative Web-based workspace facilitates the open curation of protein-specific information by providing biologists and bioinformaticians with a means of contributing to the cumulative body of biological knowledge. At the moment, it serves UniProt and GO descriptions for the annotation of proteins via a series of standardized forms or ‘slots’ for their description. This consists of definitions, attribute-value relations (e.g. a protein can be given the attribute “tissue” with the value “[e]xpressed in muscle fibers”), and provisions for disambiguating sequences or instances of proteins by identifying synonyms, disjoint concepts, alternate spellings, etc. (
<xref ref-type="bibr" rid="b55-bbi-2008-187">Wiki For Professionals 2007</xref>
). Curators can link their descriptions or proteins to other citations, references, and publications indexed in PubMed. Moreover, the wiki concept ensures that these annotations are self-validating. Other users can go in and add or revise the annotations. For example, using the “tissue” example above, a subsequent curator can reify this protein as “[e]xpressed in muscle fibers and
<italic>the brain</italic>
” (
<xref ref-type="bibr" rid="b55-bbi-2008-187">Wiki For Professionals 2007</xref>
). Similar to our ontology-based metadata approach, it combines both free-text fields for open description and more restrictive means of disambiguating proteins and protein concepts. For instance, it extends the ability to identify whether these synonyms are instances of equivalent meaning, or if they are different. If the latter is the case, curators can further annotate—or describe—specifically where these differences lie. WikiProteins is but one example of where the GO is being deployed to provide a standardized vocabulary for annotation across distributed data resources.</p>
<p>As our non-automated method for data discovery and WikiProteins for protein knowledge exchange illustrate, ontologies are not standalone solutions for interoperability but rather comprise a component of or input to large-scale interoperability infrastructures. Indeed ontologies are knowledge representations and
<italic>not</italic>
software applications, having no innate functionality. As such they must be deployed within digital architectures where constituent programs can exploit the hierarchical structure of formal ontologies to facilitate data sharing at the level of semantics. Many such cyber-infrastructures exist for biology and biomedicine. A notable example is the Cancer Biomedical Informatics Grid (caBIG), a Web-based National Institutes of Health (NIH) data consortium for cancer research (
<xref ref-type="bibr" rid="b35-bbi-2008-187">National Institutes of Health 2007</xref>
). caBIG is built on an open grid architecture similar to a federated database environment where users are presented with a central interface which seamlessly integrates participating databases, but with the addition of Web services that provide tools and applications. The emphasis of caBIG is on the provision of services—such as data analysis tools, applications, scripts, algorithms, etc.—relevant to cancer research. The grid is organized into a series of “workspaces” or virtual communities where participants can both access, revise, and upload new technologies to that specific sub-domain of application or interest. The emphasis of caBIG is on services, with participants notifying each other of the constituent services they make available by means of UML (Unified Modeling Language) metadata wherein the services are described using standardized DL-annotated concepts from a vocabulary service which defines terms and concepts in biomedical vocabularies. Here, ontologies are utilized as a standardized set of concepts and terms across applications and services for their uniform description such that researchers can locate and access the appropriate technologies on a need-be basis. This provides interoperability across distributed cancer research centers at the level of services.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We have described the challenges inherent in semantic integration of biological databases. Many of these are common to all semantic integration realms and are based on the problem of language not being transparent across institutional and user environments. In biology and other disciplines, ontologies or strict paradigmatic taxonomies have been used to mitigate the problems associated with semantic integration. Ontologies are a means of conveying context associated with semantic terms so that their meaning is transparent between multiple data users. This paper has described in depth the use of ontologies for data integration in biology. Biology has led the scientific world in developing a number of unique approaches to semantic integration. These unique ontology-based integration efforts include ontologies such as the Gene Ontology (GO), and frameworks that exploit their machine-readable semantics to support bioinformatics tasks, such as the cancer Biomedical Informatics Grid (caBIG) and the incipient WikiP-roteins knowledge community. In addition, we introduced the concept and early implementation of ontology-based metadata in order to demonstrate the role that context plays in clarifying hierarchical and schematic relationships between like but non-equivalent semantic terms in different databases. Each of these is a unique approach to surpassing the problems associated with a lack of congruity in language and meaning across scientific databases.</p>
</sec>
<sec>
<title>Brief Glossary</title>
<p>
<bold>Ontology</bold>
has a different meaning in philosophy than computing. In the former, it means the essence of being. In computing and information sciences, an ontology is a formal universe in which each entity is precisely defined and its relationship with every other entity in the specific categorical or computing realm is precisely determined. Ontologies in this context are the range of what is possible—in a computing context. They can be thought of as simply a classification system, a map legend, or a data dictionary.</p>
<p>
<bold>Epistemology</bold>
is the study of “how we know what we know.” In other words, epistemology is the lense through which we view reality. Epistemology refers, in the broadest sense, to the methods that we use to study the world and the perspective that a researcher uses to interpret entities and phenomena.</p>
<p>
<bold>Semantics</bold>
refers to the ways in which language is interpreted differently in different context, environments or in different institutional cultures.</p>
<p>
<bold>Gene prediction</bold>
—also referred to as gene finding—uses algorithms to identify biologically functional regions—or exons—of sequences which explicitly code for proteins. These are referred to as coding regions.</p>
<p>
<bold>Gene mapping</bold>
is the creation of a genetic map in which DNA fragment are linked to chromosomes.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="b1-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Agarwal</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Ontological considerations in GIScience</article-title>
<source>International Journal of Geographical Information Science</source>
<volume>19</volume>
<fpage>501</fpage>
<lpage>36</lpage>
</mixed-citation>
</ref>
<ref id="b2-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ahlqvist</surname>
<given-names>O</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>A parametrized representation of uncertain conceptual spaces</article-title>
<source>Transactions in GIS</source>
<volume>8</volume>
<fpage>493</fpage>
<lpage>514</lpage>
</mixed-citation>
</ref>
<ref id="b3-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ahlqvist</surname>
<given-names>O</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Using uncertain conceptual spaces to translate between land cover categories</article-title>
<source>International Journal of Geographical Information Science</source>
<volume>19</volume>
<fpage>831</fpage>
<lpage>57</lpage>
</mixed-citation>
</ref>
<ref id="b4-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aranguren</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Bechhofer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lord</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Sattler</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Understanding and using the meaning of statements in a bio-ontology: recasting the Gene Ontology in OWL</article-title>
<source>BMC Bioinformatics</source>
<volume>8</volume>
<fpage>57</fpage>
<lpage>69</lpage>
<pub-id pub-id-type="pmid">17311682</pub-id>
</mixed-citation>
</ref>
<ref id="b5-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ball</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Blake</surname>
<given-names>JA</given-names>
</name>
<etal></etal>
</person-group>
<year>2000</year>
<article-title>Gene Ontology: Tool for the unification of biology</article-title>
<source>Nature Genetics</source>
<volume>25</volume>
<fpage>25</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="pmid">10802651</pub-id>
</mixed-citation>
</ref>
<ref id="b6-bbi-2008-187">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Baader</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Horrocks</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Sattler</surname>
<given-names>U</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Staab</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Studer</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Description logics</article-title>
<source>Handbook on ontologies</source>
<publisher-loc>Berlin</publisher-loc>
<publisher-name>Springer-Verlag</publisher-name>
<fpage>3</fpage>
<lpage>28</lpage>
</mixed-citation>
</ref>
<ref id="b7-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berman</surname>
<given-names>JJ</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Pathology data integration with extensible markup language</article-title>
<source>Human Pathology</source>
<volume>36</volume>
<fpage>139</fpage>
<lpage>45</lpage>
<pub-id pub-id-type="pmid">15754290</pub-id>
</mixed-citation>
</ref>
<ref id="b8-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berman</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Bhatia</surname>
<given-names>K</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Biomedical data integration: Using xml to link clinical and research data sets</article-title>
<source>Expert Review of Molecular Diagnostics</source>
<fpage>329</fpage>
<lpage>36</lpage>
<pub-id pub-id-type="pmid">15934811</pub-id>
</mixed-citation>
</ref>
<ref id="b9-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bisby</surname>
<given-names>FA</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>The quiet revolution: Biodiversity informatics and the internet</article-title>
<source>Science</source>
<volume>289</volume>
<fpage>2309</fpage>
<lpage>11</lpage>
<pub-id pub-id-type="pmid">11009408</pub-id>
</mixed-citation>
</ref>
<ref id="b10-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blackmore</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Biodiversity update—progress in taxonomy</article-title>
<source>Science</source>
<volume>298</volume>
<fpage>365</fpage>
<pub-id pub-id-type="pmid">12376687</pub-id>
</mixed-citation>
</ref>
<ref id="b11-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blake</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Bio-ontologies—fast and furious</article-title>
<source>Nature Biotechnology</source>
<volume>22</volume>
<fpage>773</fpage>
<lpage>4</lpage>
</mixed-citation>
</ref>
<ref id="b12-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blake</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Bult</surname>
<given-names>CJ</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Beyond the data deluge: Data integration and bio-ontologies</article-title>
<source>Journal of Biomedical Informatics</source>
<volume>39</volume>
<fpage>314</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="pmid">16564748</pub-id>
</mixed-citation>
</ref>
<ref id="b13-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boguski</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>McIntosh</surname>
<given-names>MW</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Biomedical informatics for proteomics</article-title>
<source>Nature</source>
<volume>422</volume>
<fpage>233</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="pmid">12634797</pub-id>
</mixed-citation>
</ref>
<ref id="b14-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bowker</surname>
<given-names>GC</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Mapping biodiversity</article-title>
<source>International Journal of Geographical Information Science</source>
<volume>14</volume>
<fpage>739</fpage>
<lpage>54</lpage>
</mixed-citation>
</ref>
<ref id="b15-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buckingham</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2004a</year>
<article-title>Data’s future shock</article-title>
<source>Nature</source>
<volume>428</volume>
<fpage>774</fpage>
<lpage>7</lpage>
</mixed-citation>
</ref>
<ref id="b16-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buckingham</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2004b</year>
<article-title>Getting the meaning</article-title>
<source>Nature</source>
<volume>428</volume>
<fpage>776</fpage>
<pub-id pub-id-type="pmid">15085141</pub-id>
</mixed-citation>
</ref>
<ref id="b17-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buckingham</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>To build a better model</article-title>
<source>Nature Methods</source>
<volume>4</volume>
<fpage>367</fpage>
<lpage>74</lpage>
</mixed-citation>
</ref>
<ref id="b18-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buetow</surname>
<given-names>KH</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Cyberinfrastructure: empowering a “third way” in biomedical research</article-title>
<source>Science</source>
<volume>308</volume>
<fpage>821</fpage>
<lpage>4</lpage>
<pub-id pub-id-type="pmid">15879210</pub-id>
</mixed-citation>
</ref>
<ref id="b19-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Camon</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Magrane</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Barrell</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<year>2004</year>
<article-title>The gene ontology annotation (GOA) database: Sharing knowledge in uniprot with gene ontology</article-title>
<source>Nucleic Acids Research</source>
<volume>32</volume>
<fpage>D262</fpage>
<lpage>D6</lpage>
<pub-id pub-id-type="pmid">14681408</pub-id>
</mixed-citation>
</ref>
<ref id="b20-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carroll</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Pavlovic</surname>
<given-names>V</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Protein classification using probabilistic chain graphs and the gene ontology structure</article-title>
<source>Bioinformatics</source>
<volume>15</volume>
<fpage>1871</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="pmid">16705013</pub-id>
</mixed-citation>
</ref>
<ref id="b21-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Castro</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Rocca-Serra</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>The use of concept maps during knowledge elicitation in ontology development processes—the nutrigenomics use case</article-title>
<source>BMC Bioinformatics</source>
<fpage>267</fpage>
<lpage>80</lpage>
<pub-id pub-id-type="pmid">16725019</pub-id>
</mixed-citation>
</ref>
<ref id="b22-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chicurel</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2002a</year>
<article-title>Bioinformatics: Bringing it all together</article-title>
<source>Nature</source>
<volume>419</volume>
<fpage>751</fpage>
<lpage>7</lpage>
</mixed-citation>
</ref>
<ref id="b23-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chicurel</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2002b</year>
<article-title>Putting a name on it</article-title>
<source>Nature</source>
<volume>419</volume>
<fpage>755</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="pmid">12384708</pub-id>
</mixed-citation>
</ref>
<ref id="b24-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Choi</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>IY</given-names>
</name>
<name>
<surname>Hyoil</surname>
<given-names>H</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>A survey on ontology mapping</article-title>
<source>SIGMOD Record</source>
<volume>35</volume>
<fpage>34</fpage>
<lpage>41</lpage>
</mixed-citation>
</ref>
<ref id="b25-bbi-2008-187">
<mixed-citation publication-type="webpage">
<collab>FlyBase</collab>
<year>2007</year>
<article-title>Flybase: A database of
<italic>drosophila</italic>
genes and genomes, version fb2007_01 [online]</article-title>
<date-in-citation>Accessed 04 September 2007</date-in-citation>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://flybase.bio.indiana.edu/">http//flybase.bio.indiana.edu/</ext-link>
:
<ext-link ext-link-type="uri" xlink:href="http://flybase.bio.indiana.edu/">http://flybase.bio.indiana.edu/</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="b26-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Galperin</surname>
<given-names>MY</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>The molecular biology database collection: 2006 update</article-title>
<source>Nucleic Acids Research</source>
<volume>34</volume>
<fpage>D3</fpage>
<lpage>D5</lpage>
<pub-id pub-id-type="pmid">16381871</pub-id>
</mixed-citation>
</ref>
<ref id="b27-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gardner</surname>
<given-names>SP</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Ontologies and semantic data integration</article-title>
<source>Drug Discovery Today</source>
<volume>10</volume>
<fpage>1001</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="pmid">16023059</pub-id>
</mixed-citation>
</ref>
<ref id="b28-bbi-2008-187">
<mixed-citation publication-type="webpage">
<collab>Gene Ontology Consortium</collab>
<year>2007</year>
<source>The gene ontology [online]</source>
<date-in-citation>Accessed 15 July 2007</date-in-citation>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://www.geneontology.org">http://www.geneontology.org</ext-link>
:
<ext-link ext-link-type="uri" xlink:href="http://www.geneontology.org">http://www.geneontology.org</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="b29-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Giles</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Key biology databases go wiki</article-title>
<source>Nature</source>
<volume>445</volume>
<fpage>691</fpage>
<pub-id pub-id-type="pmid">17301755</pub-id>
</mixed-citation>
</ref>
<ref id="b30-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hill</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Blake</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Richardson</surname>
<given-names>JE</given-names>
</name>
<etal></etal>
</person-group>
<year>2002</year>
<article-title>Extension and integration of the gene ontology (go): Combining go vocabularies with external vocabularies</article-title>
<source>Genome Research</source>
<volume>12</volume>
<fpage>1982</fpage>
<lpage>91</lpage>
<pub-id pub-id-type="pmid">12466303</pub-id>
</mixed-citation>
</ref>
<ref id="b31-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kohler</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Philippi</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lange</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>SEMEDA: Ontology based semantic integration of biological databases</article-title>
<source>Bioinformatics</source>
<volume>18</volume>
<fpage>2429</fpage>
<lpage>7</lpage>
</mixed-citation>
</ref>
<ref id="b32-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kohler</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Schulze-Kremer</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>The semantic metadatabase (SEM-EDA): Ontology based integration of federated molecular biological data sources</article-title>
<source>In Silico Biology</source>
<volume>2</volume>
<fpage>219</fpage>
<lpage>31</lpage>
<pub-id pub-id-type="pmid">12542408</pub-id>
</mixed-citation>
</ref>
<ref id="b33-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lewis</surname>
<given-names>SE</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Gene ontology: Looking backwards and forwards</article-title>
<source>Genome Biology</source>
<volume>6</volume>
<fpage>103.1</fpage>
<lpage>.4</lpage>
<pub-id pub-id-type="pmid">15642104</pub-id>
</mixed-citation>
</ref>
<ref id="b34-bbi-2008-187">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Lord</surname>
<given-names>PW</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Brass</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<year>2003</year>
<article-title>Semantic similarity measures as tools for exploring the gene ontology</article-title>
<conf-name>Pacific Symposium Biocomputing</conf-name>
<conf-loc>Lihue, Hawaii</conf-loc>
</mixed-citation>
</ref>
<ref id="b35-bbi-2008-187">
<mixed-citation publication-type="webpage">
<collab>National Institutes of Health</collab>
<year>2007</year>
<source>Cancer biomedical informatics grid [online]</source>
<date-in-citation>Accessed 10 August 2007</date-in-citation>
<comment>
<ext-link ext-link-type="uri" xlink:href="https://cabig.nci.nih.gov/">https://cabig.nci.nih.gov/</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="b36-bbi-2008-187">
<mixed-citation publication-type="webpage">
<collab>The Open Biomedical Ontologies</collab>
<year>2007</year>
<source>The Open Biomedical Ontologies [online]</source>
<date-in-citation>Accessed 24 December 2007</date-in-citation>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://obofoundry.org/about.shtml">http://obofoundry.org/about.shtml</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="b37-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pennisi</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Taxonomic revival</article-title>
<source>Science</source>
<volume>289</volume>
<fpage>2306</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="pmid">11041799</pub-id>
</mixed-citation>
</ref>
<ref id="b38-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peters</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Sette</surname>
<given-names>A</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Integrating epitope data into the emerging web of biomedical knowledge resources</article-title>
<source>Nature Reviews Immunology</source>
<volume>7</volume>
<fpage>485</fpage>
<lpage>90</lpage>
</mixed-citation>
</ref>
<ref id="b39-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rector</surname>
<given-names>AL</given-names>
</name>
</person-group>
<year>1999</year>
<article-title>Clinical terminology: Why is it so hard?</article-title>
<source>Methods of Information Medicine</source>
<volume>38</volume>
<fpage>239</fpage>
<lpage>52</lpage>
</mixed-citation>
</ref>
<ref id="b40-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saeys</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Rouze</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>In search of the small ones: Improved prediction of short exons in vertebrates, plants, fungi and protists</article-title>
<source>Bioinformatics</source>
<volume>23</volume>
<fpage>414</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="pmid">17204465</pub-id>
</mixed-citation>
</ref>
<ref id="b41-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sauer</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Heinemann</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zamboni</surname>
<given-names>N</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Genetics: Getting closer to the whole picture</article-title>
<source>Science</source>
<volume>316</volume>
<fpage>550</fpage>
<lpage>1</lpage>
<pub-id pub-id-type="pmid">17463274</pub-id>
</mixed-citation>
</ref>
<ref id="b42-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schulze-Kremer</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Ontologies for molecular biology and bioinformatics</article-title>
<source>In Silico Biology</source>
<volume>2</volume>
<fpage>179</fpage>
<lpage>93</lpage>
<pub-id pub-id-type="pmid">12542404</pub-id>
</mixed-citation>
</ref>
<ref id="b43-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schuurman</surname>
<given-names>N</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Why formalization matters: Critical GIS and ontology research</article-title>
<source>Annals of the Association of American Geographers</source>
<volume>96</volume>
<fpage>726</fpage>
<lpage>39</lpage>
</mixed-citation>
</ref>
<ref id="b44-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schuurman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Leszczynski</surname>
<given-names>A</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Ontology-based metadata</article-title>
<source>Transactions in GIS</source>
<volume>10</volume>
<fpage>709</fpage>
<lpage>26</lpage>
</mixed-citation>
</ref>
<ref id="b45-bbi-2008-187">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Schuurman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Leszczynski</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Fiedler</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Riedl</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kainz</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Elmes</surname>
<given-names>G</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Building an integrated cadastral fabric for higher resolution socioeconomic spatial data analysis</article-title>
<source>Process in spatial data handling: 12th international symposium on spatial data handling</source>
<publisher-loc>Berlin Heidelberg New York</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>897</fpage>
<lpage>920</lpage>
</mixed-citation>
</ref>
<ref id="b46-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Searls</surname>
<given-names>DB</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Data integration: Challenges for drug discovery</article-title>
<source>Nature Reviews Drug Discovery</source>
<volume>4</volume>
<fpage>45</fpage>
<lpage>58</lpage>
</mixed-citation>
</ref>
<ref id="b47-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rosse</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bard</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bug</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Seusters</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Goldberg</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Eilbeck</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ireland</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mungall</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>Leontis</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Rocca-Serra</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Ruttenber</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sansone</surname>
<given-names>S-A</given-names>
</name>
<name>
<surname>Sheuermann</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Whetzel</surname>
<given-names>PL</given-names>
</name>
<name>
<surname>Lewis</surname>
<given-names>S</given-names>
</name>
</person-group>
<collab>the OBO Consortium</collab>
<year>2007</year>
<article-title>The OBO Foundry: coordinated evolucation of ontologies to support biomedical integration</article-title>
<source>Nature Biotechnology</source>
<volume>25</volume>
<fpage>1251</fpage>
<lpage>5</lpage>
</mixed-citation>
</ref>
<ref id="b48-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Ceusters</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Klagges</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kohler</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lomax</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mungall</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Neuhaus</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Rector</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Rosse</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Relations in biomedical ontologies</article-title>
<source>Genome Biology</source>
<volume>6</volume>
<fpage>R.46</fpage>
</mixed-citation>
</ref>
<ref id="b49-bbi-2008-187">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Schulze-Kremer</surname>
<given-names>S</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Musen</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>The ontology of the gene ontology</article-title>
<conf-name>AMIA 2003 Annual Symposium Proceedings</conf-name>
<conf-loc>Washington, DC</conf-loc>
</mixed-citation>
</ref>
<ref id="b50-bbi-2008-187">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Sowa</surname>
<given-names>JF</given-names>
</name>
</person-group>
<year>2000</year>
<source>Knowledge representation: Logical, philosophical, and computational foundations</source>
<publisher-loc>Pacific Grove, CA</publisher-loc>
<publisher-name>Brooks/Cole</publisher-name>
</mixed-citation>
</ref>
<ref id="b51-bbi-2008-187">
<mixed-citation publication-type="journal">
<collab>Stanford Medical Informatics</collab>
<year>2005</year>
<source>The protege ontology editor and knowledge acquisition system</source>
</mixed-citation>
</ref>
<ref id="b52-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sugden</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Pennisi</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Diversity digitized</article-title>
<source>Science</source>
<volume>29</volume>
<fpage>2305</fpage>
<pub-id pub-id-type="pmid">11041798</pub-id>
</mixed-citation>
</ref>
<ref id="b53-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thomas</surname>
<given-names>CE</given-names>
</name>
<name>
<surname>Ganji</surname>
<given-names>G</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Integration of genomic and metabonomic data in systems biology—are we ‘there’ yet?</article-title>
<source>Current Opinion in Drug Discovery and Development</source>
<volume>9</volume>
<fpage>92</fpage>
<lpage>1000</lpage>
</mixed-citation>
</ref>
<ref id="b54-bbi-2008-187">
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Tarczy-Hornoch</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Shaker</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<article-title>Biomediator data integration: Beyond genomics to neuroscience data</article-title>
<conf-name>AMIA Annual Symposium</conf-name>
<conf-loc>AMIA</conf-loc>
</mixed-citation>
</ref>
<ref id="b55-bbi-2008-187">
<mixed-citation publication-type="webpage">
<collab>Wiki For Professionals</collab>
<year>2007</year>
<source>Wikiproteins [online]</source>
<date-in-citation>Accessed 24 July 2007</date-in-citation>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://wikiprofessional.info">http://wikiprofessional.info</ext-link>
:
<ext-link ext-link-type="uri" xlink:href="http://wikiprofessional.info">http://wikiprofessional.info</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="b56-bbi-2008-187">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wolstencroft</surname>
<given-names>K</given-names>
</name>
<name>
<surname>McEntire</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<article-title>Constructing ontology-driven protein family databases</article-title>
<source>Bioinformatics</source>
<volume>25</volume>
<fpage>1685</fpage>
<lpage>92</lpage>
<pub-id pub-id-type="pmid">15564301</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
<floats-group>
<fig id="f1-bbi-2008-187" position="float">
<label>Figure 1</label>
<caption>
<p>The formalization process. Moving from a concept of a particular gene to its encoded reification and ontological representation. Note how the entity (fruit fly) becomes increasingly represented in digital database format as it is formalized, or abstracted from its real-world form. The entity loses dimensionality, while researchers gain the advantage of computational function.
<xref ref-type="fig" rid="f3-bbi-2008-187">Figure 3</xref>
illustrates in more detail the role that entity descriptions—or annotations—play in creating a larger standardized digital knowledge environment for bioinformatics. Note that any gene product many have more than one annotation in the same branch (see Molecular Function this example), and can be annotated in three different branches of GO (Cellular Component, Biological Process, and Molecular Function) (
<xref ref-type="bibr" rid="b25-bbi-2008-187">FlyBase 2007</xref>
, The Gene Ontology Consortium 2077).</p>
</caption>
<graphic xlink:href="bbi-2008-187f1"></graphic>
</fig>
<fig id="f2-bbi-2008-187" position="float">
<label>Figure 2</label>
<caption>
<p>The Gene Ontology as a global ontology for bioinformatics. Smaller scale bioinformatics ontologies almost invariably map to the GO (
<bold>a</bold>
). Several large databases, such as FlyBase (
<bold>b</bold>
), contribute annotation to the GO using its semantics such that there is a direct mapping between genes/gene products at the database level and their participation in the ontology. (FlyBase annotation is explained in greater detail in
<xref ref-type="fig" rid="f1-bbi-2008-187">Fig. 1</xref>
). Where annotation is unique to the database, a translation program can transform annotation into a tractable GO representation (
<bold>c</bold>
) (
<xref ref-type="bibr" rid="b19-bbi-2008-187">Camon et al. 2004</xref>
). The GO provides a standardized vocabulary for the description of genes and gene product across not only databases but also in emerging bioinformatics infrastructures, such as WikiProteins (
<bold>d</bold>
). The consistency of semantics reduces ambiguity in the query of bioinformatics resources, and allows genes and gene products to be retrieved on the basis of common biology rather than lexical coincidence (
<bold>e</bold>
).</p>
</caption>
<graphic xlink:href="bbi-2008-187f2"></graphic>
</fig>
<fig id="f3-bbi-2008-187" position="float">
<label>Figure 3</label>
<caption>
<p>Ontology mapping. An ontology for hypertension resulting from the merging of hypertension concepts in the British Columbia Reproductive Care Program Perinatal Database Registry (BCRCP PRD) and the Canadian Perinatal Database Minimum Dataset. The resulting output ontology shows the hirerarchical nesting of hypertension semantics originating in respective databases in relation to each other.</p>
</caption>
<graphic xlink:href="bbi-2008-187f3"></graphic>
</fig>
</floats-group>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000696 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000696 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2735951
   |texte=   Ontologies for Bioinformatics
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:19812775" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024