Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000210 ( Pmc/Corpus ); précédent : 0002099; suivant : 0002110 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Structuring and extracting knowledge for the support of hypothesis generation in molecular biology</title>
<author>
<name sortKey="Roos, Marco" sort="Roos, Marco" uniqKey="Roos M" first="Marco" last="Roos">Marco Roos</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Marshall, M Scott" sort="Marshall, M Scott" uniqKey="Marshall M" first="M Scott" last="Marshall">M Scott Marshall</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gibson, Andrew P" sort="Gibson, Andrew P" uniqKey="Gibson A" first="Andrew P" last="Gibson">Andrew P. Gibson</name>
<affiliation>
<nlm:aff id="I2">Swammerdam Institute for Life Science, University of Amsterdam, Amsterdam, 1018 WB, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schuemie, Martijn" sort="Schuemie, Martijn" uniqKey="Schuemie M" first="Martijn" last="Schuemie">Martijn Schuemie</name>
<affiliation>
<nlm:aff id="I3">BioSemantics group, Erasmus University of Rotterdam, Rotterdam, 3000 DR, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Meij, Edgar" sort="Meij, Edgar" uniqKey="Meij E" first="Edgar" last="Meij">Edgar Meij</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Katrenko, Sophia" sort="Katrenko, Sophia" uniqKey="Katrenko S" first="Sophia" last="Katrenko">Sophia Katrenko</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Van Hage, Willem Robert" sort="Van Hage, Willem Robert" uniqKey="Van Hage W" first="Willem Robert" last="Van Hage">Willem Robert Van Hage</name>
<affiliation>
<nlm:aff id="I4">Business Informatics, Faculty of Sciences, Vrije Universiteit, Amsterdam, 1081 HV, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Krommydas, Konstantinos" sort="Krommydas, Konstantinos" uniqKey="Krommydas K" first="Konstantinos" last="Krommydas">Konstantinos Krommydas</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Adriaans, Pieter W" sort="Adriaans, Pieter W" uniqKey="Adriaans P" first="Pieter W" last="Adriaans">Pieter W. Adriaans</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">19796406</idno>
<idno type="pmc">2755830</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2755830</idno>
<idno type="RBID">PMC:2755830</idno>
<idno type="doi">10.1186/1471-2105-10-S10-S9</idno>
<date when="2009">2009</date>
<idno type="wicri:Area/Pmc/Corpus">000210</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Structuring and extracting knowledge for the support of hypothesis generation in molecular biology</title>
<author>
<name sortKey="Roos, Marco" sort="Roos, Marco" uniqKey="Roos M" first="Marco" last="Roos">Marco Roos</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Marshall, M Scott" sort="Marshall, M Scott" uniqKey="Marshall M" first="M Scott" last="Marshall">M Scott Marshall</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gibson, Andrew P" sort="Gibson, Andrew P" uniqKey="Gibson A" first="Andrew P" last="Gibson">Andrew P. Gibson</name>
<affiliation>
<nlm:aff id="I2">Swammerdam Institute for Life Science, University of Amsterdam, Amsterdam, 1018 WB, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schuemie, Martijn" sort="Schuemie, Martijn" uniqKey="Schuemie M" first="Martijn" last="Schuemie">Martijn Schuemie</name>
<affiliation>
<nlm:aff id="I3">BioSemantics group, Erasmus University of Rotterdam, Rotterdam, 3000 DR, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Meij, Edgar" sort="Meij, Edgar" uniqKey="Meij E" first="Edgar" last="Meij">Edgar Meij</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Katrenko, Sophia" sort="Katrenko, Sophia" uniqKey="Katrenko S" first="Sophia" last="Katrenko">Sophia Katrenko</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Van Hage, Willem Robert" sort="Van Hage, Willem Robert" uniqKey="Van Hage W" first="Willem Robert" last="Van Hage">Willem Robert Van Hage</name>
<affiliation>
<nlm:aff id="I4">Business Informatics, Faculty of Sciences, Vrije Universiteit, Amsterdam, 1081 HV, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Krommydas, Konstantinos" sort="Krommydas, Konstantinos" uniqKey="Krommydas K" first="Konstantinos" last="Krommydas">Konstantinos Krommydas</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Adriaans, Pieter W" sort="Adriaans, Pieter W" uniqKey="Adriaans P" first="Pieter W" last="Adriaans">Pieter W. Adriaans</name>
<affiliation>
<nlm:aff id="I1">Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior knowledge, while information retrieval and information extraction techniques enable its extraction from literature. Their combination makes prior knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes.</p>
</sec>
<sec>
<title>Results</title>
<p>We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store knowledge, while a workflow extracts knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We demonstrated a 'do-it-yourself' approach for structuring and extracting knowledge in the context of experimental research on biomolecular mechanisms. The methodology can be used to bootstrap the construction of semantically rich biological models using the results of knowledge extraction processes. Models specific to particular experiments can be constructed that, in turn, link with other semantic models, creating a web of knowledge that spans experiments. Mapping mechanisms can link to other knowledge resources such as OBO ontologies or SKOS vocabularies. AIDA Web Services can be used to design personalized knowledge extraction procedures. In our example experiment, we found three proteins (NF-Kappa B, p21, and Bax) potentially playing a role in the interplay between nutrients and epigenetic gene regulation.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-title>BMC Bioinformatics</journal-title>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">19796406</article-id>
<article-id pub-id-type="pmc">2755830</article-id>
<article-id pub-id-type="publisher-id">1471-2105-10-S10-S9</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-10-S10-S9</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Structuring and extracting knowledge for the support of hypothesis generation in molecular biology</article-title>
</title-group>
<contrib-group>
<contrib id="A1" corresp="yes" contrib-type="author">
<name>
<surname>Roos</surname>
<given-names>Marco</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>roos@science.uva.nl</email>
</contrib>
<contrib id="A2" contrib-type="author">
<name>
<surname>Marshall</surname>
<given-names>M Scott</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>marshall@science.uva.nl</email>
</contrib>
<contrib id="A3" contrib-type="author">
<name>
<surname>Gibson</surname>
<given-names>Andrew P</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>a.p.gibson@uva.nl</email>
</contrib>
<contrib id="A4" contrib-type="author">
<name>
<surname>Schuemie</surname>
<given-names>Martijn</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>m.schuemie@erasmusmc.nl</email>
</contrib>
<contrib id="A5" contrib-type="author">
<name>
<surname>Meij</surname>
<given-names>Edgar</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>not@valid.com</email>
</contrib>
<contrib id="A6" contrib-type="author">
<name>
<surname>Katrenko</surname>
<given-names>Sophia</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>not@valid.com</email>
</contrib>
<contrib id="A7" contrib-type="author">
<name>
<surname>van Hage</surname>
<given-names>Willem Robert</given-names>
</name>
<xref ref-type="aff" rid="I4">4</xref>
<email>not@valid.com</email>
</contrib>
<contrib id="A8" contrib-type="author">
<name>
<surname>Krommydas</surname>
<given-names>Konstantinos</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>not@valid.com</email>
</contrib>
<contrib id="A9" contrib-type="author">
<name>
<surname>Adriaans</surname>
<given-names>Pieter W</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>P.W.Adriaans@uva.nl</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ, The Netherlands</aff>
<aff id="I2">
<label>2</label>
Swammerdam Institute for Life Science, University of Amsterdam, Amsterdam, 1018 WB, The Netherlands</aff>
<aff id="I3">
<label>3</label>
BioSemantics group, Erasmus University of Rotterdam, Rotterdam, 3000 DR, The Netherlands</aff>
<aff id="I4">
<label>4</label>
Business Informatics, Faculty of Sciences, Vrije Universiteit, Amsterdam, 1081 HV, The Netherlands</aff>
<pub-date pub-type="collection">
<year>2009</year>
</pub-date>
<pub-date pub-type="epub">
<day>1</day>
<month>10</month>
<year>2009</year>
</pub-date>
<volume>10</volume>
<issue>Suppl 10</issue>
<supplement>
<named-content content-type="supplement-title">Semantic Web Applications and Tools for Life Sciences, 2008</named-content>
<named-content content-type="supplement-editor">Albert Burger, Paolo Romano, Adrian Paschke and Andrea Splendiani</named-content>
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/content/pdf/1471-2105-10-S10-info.pdf">http://www.biomedcentral.com/content/pdf/1471-2105-10-S10-info.pdf</ext-link>
</supplement>
<fpage>S9</fpage>
<lpage>S9</lpage>
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2105/10/S10/S9"></ext-link>
<permissions>
<copyright-statement>Copyright © 2009 Roos et al; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2009</copyright-year>
<copyright-holder>Roos et al; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<p>This is an open access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0"></ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
<pmc-comment> Roos Marco roos@science.uva.nl Structuring and extracting knowledge for the support of hypothesis generation in molecular biology 2009BMC Bioinformatics 10(Suppl 10): S9-. (2009)1471-2105(2009)10:Suppl 10urn:ISSN:1471-2105</pmc-comment>
</license>
</permissions>
<abstract>
<sec>
<title>Background</title>
<p>Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior knowledge, while information retrieval and information extraction techniques enable its extraction from literature. Their combination makes prior knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes.</p>
</sec>
<sec>
<title>Results</title>
<p>We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store knowledge, while a workflow extracts knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We demonstrated a 'do-it-yourself' approach for structuring and extracting knowledge in the context of experimental research on biomolecular mechanisms. The methodology can be used to bootstrap the construction of semantically rich biological models using the results of knowledge extraction processes. Models specific to particular experiments can be constructed that, in turn, link with other semantic models, creating a web of knowledge that spans experiments. Mapping mechanisms can link to other knowledge resources such as OBO ontologies or SKOS vocabularies. AIDA Web Services can be used to design personalized knowledge extraction procedures. In our example experiment, we found three proteins (NF-Kappa B, p21, and Bax) potentially playing a role in the interplay between nutrients and epigenetic gene regulation.</p>
</sec>
</abstract>
<conference>
<conf-date>
<day>28</day>
<month>11</month>
<year>2008</year>
</conf-date>
<conf-name>Semantic Web Applications and Tools for Life Sciences, 2008</conf-name>
<conf-loc>Edinburgh, UK</conf-loc>
</conference>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>In order to study a biomolecular mechanism such as epigenetic gene control (Figure
<xref ref-type="fig" rid="F1">1</xref>
) and formulate a new hypothesis, we usually integrate various types of information to distil a comprehensible model. We can use this model to discuss with our peers before we test the model in the laboratory or by comparison to available data. A typical hypothesis is based on one's own knowledge, interpretations of experimental data, the opinions of peers, and the prior knowledge that is contained in literature. Many Web resources are available for molecular biologists to access available knowledge, of which Entrez PubMed, hosted by the US National Center for Biotechnology Information (NCBI), is probably the most used by molecular biologists. The difficulty of information retrieval from literature reveals the scale of today's information overload: over 17 million biomedical documents are now available from PubMed. Also considering the knowledge that did not make it to publication or that is stored in various types of databases and file systems, many scientists find it increasingly challenging to ensure that all potentially relevant facts are considered whilst forming a hypothesis. Support for extracting and managing knowledge is therefore a general requirement. Developments in the area of the Semantic Web and related areas such as information retrieval are making it possible to create applications that will support the task of hypothesis generation. First, RDF and OWL provide us with a way to represent knowledge in a machine readable format that is amenable to machine inference [
<xref ref-type="bibr" rid="B1">1</xref>
,
<xref ref-type="bibr" rid="B2">2</xref>
]. Ontologies have become an important source of knowledge in molecular biology. Many ontologies have been created and many types of application have become possible [
<xref ref-type="bibr" rid="B3">3</xref>
], with the life sciences providing a key motivation of addressing the information management problem that arises from high throughput data collection [
<xref ref-type="bibr" rid="B4">4</xref>
,
<xref ref-type="bibr" rid="B5">5</xref>
]. A downside to the popularity of bio-ontologies is that their number and size have become overwhelming when attempting to discover the best representation for one's personal hypothesis. Moreover, building a biological ontology is usually associated with a community effort where consensus is sought for clear descriptions of biological phenomena [
<xref ref-type="bibr" rid="B6">6</xref>
]. The question arises how an experimental biologist/bioinformatician can apply Semantic Web languages when the primary aim is not to build a comprehensive ontology for a community, but to represent a personal hypothesis for a particular biomolecular mechanism. Therefore, we explored an approach to semantic modeling that emphasizes the creation of a personal model within the scope of one hypothesis, but without precluding integration with other ontologies. Secondly, information retrieval and information extraction techniques can be used to elucidate putative knowledge to consider for a hypothesis by selecting relevant data and recognizing biological entities (e.g. protein names) and relations in text [
<xref ref-type="bibr" rid="B5">5</xref>
,
<xref ref-type="bibr" rid="B8">8</xref>
]. For instance, tools and algorithms have been developed that match predefined sets of biological terms [
<xref ref-type="bibr" rid="B7">7</xref>
,
<xref ref-type="bibr" rid="B8">8</xref>
], or that use machine learning algorithms to recognize entities and extract relations based on their context in a document [
<xref ref-type="bibr" rid="B9">9</xref>
]. These techniques can also be used to extend an ontology [
<xref ref-type="bibr" rid="B10">10</xref>
,
<xref ref-type="bibr" rid="B11">11</xref>
]. Several tools exist for text mining (See, for instance [
<xref ref-type="bibr" rid="B8">8</xref>
]), but for a methodology to be attractive to practitioners of experimental molecular biology we would like a method that is more directly analogous to wet laboratory experimentation. Workflow management systems offer a platform for in silico experimentation [
<xref ref-type="bibr" rid="B12">12</xref>
-
<xref ref-type="bibr" rid="B14">14</xref>
] where, for example, data integration [
<xref ref-type="bibr" rid="B5">5</xref>
,
<xref ref-type="bibr" rid="B15">15</xref>
], and systematic large-scale analysis [
<xref ref-type="bibr" rid="B16">16</xref>
] have been implemented. Workflows can also be shared on the web such as accomplished in myExperiment [
<xref ref-type="bibr" rid="B17">17</xref>
]. In a workflow, the steps of a computational experiment are carried out by individual components for which Web Services provide a common communication protocol [
<xref ref-type="bibr" rid="B18">18</xref>
]. We adopted the workflow paradigm for the design and execution of a reusable knowledge extraction experiment. The main services in the workflow are from the 'Adaptive Information Disclosure Application' toolkit (AIDA) that we are developing for knowledge management applications [
<xref ref-type="bibr" rid="B19">19</xref>
] and this document). The output enriches a knowledge base with putative biological relations and corresponding evidence. The approach is not limited to text mining but can be applied to knowledge extracted during any computational experiment. The advantage of routinely storing extracted knowledge is that it enables us to perform posterior analysis across many experiments.</p>
<fig position="float" id="F1">
<label>Figure 1</label>
<caption>
<p>
<bold>Cartoon model for the mechanism of chromatin condensation and decondensation</bold>
. Models for condensation and decondensation of chromatin, a determinant of transcriptional activity, involves enzymes for histone acetylation (HAT) and histone deacetylase (HDAC), DNA methylation, and methylation of histone H3K9 [
<xref ref-type="bibr" rid="B47">47</xref>
]. Cartoon representations are a typical means for scientific discourse for molecular biologists.</p>
</caption>
<graphic xlink:href="1471-2105-10-S10-S9-1"></graphic>
</fig>
</sec>
<sec>
<title>Results</title>
<p>We present the methodology in the following order: 1) a description of representing prior knowledge through proto-ontologies; 2) extension of the proto-ontologies by a workflow that adds instances to a semantic repository preloaded with the proto-ontologies; 3) a description of how to query the knowledge base; 4) a description of the toolkit that we use for knowledge extraction and knowledge management. Data and references are accessible from pack 58 on myExperiment.org [
<xref ref-type="bibr" rid="B20">20</xref>
].</p>
<sec>
<title>Model representation in OWL</title>
<sec>
<title>Different types of knowledge</title>
<p>Step one of our methodology is to define machine readable 'proto-ontologies' to represent our biological hypothesis within the scope of an experiment. The experiment in this case is a procedure to extract protein relations from literature. Our approach is based on the assumption that knowledge models can grow with each experiment that we or others perform. Therefore, we created a minimal OWL ontology of the relevant biological domain entities and their biological relations for our knowledge extraction experiment. The purpose of the experiment is to populate (enrich) the proto-ontologies with instances derived from literature. We also modeled the evidence that led to these instances. For instance, the process by which a protein name was found and in which document it was found. We find a clash between our intention of enriching a biological model, and the factual observations of a text mining procedure such as 'term', 'interaction assertion', or 'term collocation'. For example, it is obvious that collocation of the terms 'HDAC1' and 'p53' in one abstract does not necessarily imply collocation of the referred proteins in a cell. In order to avoid conflation of knowledge from the different stages of our knowledge extraction process, we purposefully kept distinct OWL models. This lead to the creation of the following models that will be treated in detail below:</p>
<p>❑ Biological knowledge for our hypothesis (Protein, Association)</p>
<p>❑ Text (Terms, Document references)</p>
<p>❑ Knowledge extraction process (Steps of the procedure)</p>
<p>❑ Extraction procedure implementation (Web Service and Workflow runs)</p>
<p>❑ Mapping model to integrate the above through references.</p>
<p>❑ Results (Instances of extracted terms and relations)</p>
</sec>
<sec>
<title>Biological model</title>
<p>For the biological model, we started with a minimal set of classes designed for hypotheses about proteins and protein-protein associations (Figure
<xref ref-type="fig" rid="F2">2</xref>
). This model contains classes such as 'Protein', 'Interaction' and 'Biological Model'. We regard instances in the biological model as interpretations of certain observations, in our case, of text mining results. We also do not consider instances of these classes as biological facts; they are restricted to a hypothetical model in line with common practice in experimental biology. The evidence for the interpretation is important, but it is not within the scope of this model. In the case of text mining, evidence is modeled by the text, text mining, and implementation models.</p>
<fig position="float" id="F2">
<label>Figure 2</label>
<caption>
<p>
<bold>Graphical representation of the biological domain model in OWL and example instances</bold>
. This proto-ontology contains classes for instances that may be relevant in hypotheses about chromatin (de)condensation. HDAC1 and PCAF are example instances representing proteins implied in models about this process and known to interact. In this and following figures, red diamonds represent instances, dashed arrows connected to diamonds represent instance-of relations and blue dashed arrows represent properties between classes or instances. Inverse relations are not shown for clarity. Protein Association represents the reified relation in which two (or more) proteins participate. Instances of 'BiologicalModel' represent an abstraction of a biological hypothesis that can be partially represented by user queries, proteins provided by the user, and proteins discovered by text mining.</p>
</caption>
<graphic xlink:href="1471-2105-10-S10-S9-2"></graphic>
</fig>
</sec>
<sec>
<title>Text model</title>
<p>A model of the structure of documents and statements therein is less ambiguous than the biological model, because we can directly inspect concrete instances such as (references to) documents or pieces of text (Figure
<xref ref-type="fig" rid="F3">3</xref>
). We can be sure of the scope of the model and we can be clear about the distinction between classes and instances because we computationally process the documents. This model contains classes for documents, protein or gene names, and mentions of associations between proteins or genes.</p>
<fig position="float" id="F3">
<label>Figure 3</label>
<caption>
<p>
<bold>Graphical representation of proto-ontology for entities in text and example instances</bold>
. This proto-ontology contains classes of instances for documents, terms, and statements found in the text of the documents. The latter relation is represented by 'component of' properties. The instances represent concrete observations in text. Properties such as 'relates' and 'relatesBy' represent their interrelations. Example instances are shown for protein names 'HDAC1' and 'p68' and an assertion suggesting a relation between these two proteins.</p>
</caption>
<graphic xlink:href="1471-2105-10-S10-S9-3"></graphic>
</fig>
</sec>
<sec>
<title>Text mining model</title>
<p>Next, we created a model for the knowledge extraction process. This model serves to retrieve the evidence for the population of our biological model (Figure
<xref ref-type="fig" rid="F4">4</xref>
). It contains classes for information retrieval and information extraction such as 'collocation process' and properties such as 'discovered by'. We also created classes to contain text mining specific information such as the likelihood of terms being found in the literature. This allows us to inspect the uncertainty of certain findings. Because any procedure could be implemented in various ways, we created a separate model for the implementation artifacts.</p>
<fig position="float" id="F4">
<label>Figure 4</label>
<caption>
<p>
<bold>Graphical representation of the proto-ontology for the text mining process</bold>
. This proto-ontology contains the classes for instances of the processes by which a knowledge extraction experiment is performed. The darker coloured classes represent restriction classes for instances that have at least one 'discoveredBy' property defined.</p>
</caption>
<graphic xlink:href="1471-2105-10-S10-S9-4"></graphic>
</fig>
</sec>
<sec>
<title>Workflow model</title>
<p>For more complete knowledge provenance, we also created a model representing the implementation of the text mining process as a workflow of (AIDA) Web Services. Example instances are (references to) the AIDA Web Services, and runs of these services. Following the properties of these instances we can retrace a particular run of the workflow.</p>
</sec>
<sec>
<title>Mapping model</title>
<p>At this point, we have created a clear framework for the description of our biological domain and the documents and the text mining results as instances in our text and text mining ontologies. The next step is to relate the instances in the various models to the biological domain model. Our strategy is to initially keep the domain model simple at the class and object property level, and to map sets of instances from our results to the domain model. For this, we created an additional mapping model that defines reference properties between the models (Figure
<xref ref-type="fig" rid="F5">5</xref>
). This allows us to see that an interaction between the proteins labeled 'p68' and 'HDAC1' in our hypothetical model is referred to by a mention of an association between the terms 'p68' and 'HDAC1', with a likelihood score that indicates how remarkable it is to find this combination in literature.</p>
<fig position="float" id="F5">
<label>Figure 5</label>
<caption>
<p>
<bold>Graphical representation of the proto-ontology containing the mapping properties between the biological, text, and text mining models</bold>
. The 'reference' properties connect the concrete observations captured in the text model with the model representations in the biological model. For instance, the discovered protein name 'HDAC1' in the text mining model refers to the protein labelled 'HDAC1' that is a component of an instance representing a chromatin condensation hypothesis.</p>
</caption>
<graphic xlink:href="1471-2105-10-S10-S9-5"></graphic>
</fig>
<p>In summary, we have created proto-ontologies that separate the different views on biomolecular knowledge derived from literature by a text mining experiment. We can create instances in each view and their interrelations (Figure
<xref ref-type="fig" rid="F6">6</xref>
). This allows us to trace the experimental evidence for knowledge contained in the biological model. In a case of text mining such as ours, evidence is modeled by the document, text mining, and workflow models. A different type of computational experiment would require other models and new mappings to represent evidence.</p>
<fig position="float" id="F6">
<label>Figure 6</label>
<caption>
<p>
<bold>Knowledge extraction workflow</bold>
. The knowledge extraction workflow has three parts. The left part executes the steps of a basic text mining procedure: (i) extract protein names from the user query and add synonyms using the BioSemantics synonym service, (ii) retrieve documents from MedLine with the AIDA document search service, (iii) extract proteins with the AIDA named entity recognition service, (iv) calculate a ranking score for each discovery. The middle workflow converts the results from the text mining workflow to RDF using the biological model and the text model as template. The workflow on the right-side creates execution-level instances for the workflow components and couples these to the instances created in the middle workflow. The blue rectangles represent inputs and outputs. The pink rectangles represent sub-workflows.</p>
</caption>
<graphic xlink:href="1471-2105-10-S10-S9-6"></graphic>
</fig>
</sec>
</sec>
<sec>
<title>Knowledge extraction experiment</title>
<p>The proto-ontologies form the basis of our knowledge base. They provide the initial templates for the knowledge that we wish to be able to interrogate in search of new hypotheses. The next step is to populate the knowledge base with instances. At the modeling stage we already anticipated that our first source of knowledge would be literature, and that we would obtain instances by text mining. An element of our approach is to regard knowledge extraction procedures as 'computational experiments' analogous to a wet laboratory experiments. We therefore used the workflow paradigm to design the protocol of our text mining experiment, here with the workflow design and enactment tool Taverna [
<xref ref-type="bibr" rid="B13">13</xref>
,
<xref ref-type="bibr" rid="B21">21</xref>
]. A basic text mining workflow consists of the following steps: (i) Retrieve relevant documents from MedLine, in particular their abstracts, (ii) Extract protein names from the retrieved abstracts, and (iii) Present the results for inspection. We implemented the text mining process as a workflow (Figure
<xref ref-type="fig" rid="F6">6</xref>
). We added an additional sub-workflow to process the input query in order to extract known protein names from the input query and expand the query with synonyms for known protein names. For this, we employed a Web Service that provides UniProt identifiers and synonyms for human, rat and mouse gene names. These were derived from a combination of several public databases [
<xref ref-type="bibr" rid="B22">22</xref>
]. The query is first split into its individual terms with a service from the AIDA Toolkit that wraps the Lucene tokenizer, and then all the terms (tokens) from the original query are checked for having a UniProt identifier by which they are identified as referring to a known protein. The sub-workflow makes the synonyms, UniProt identifiers, and the expanded query available for the rest of the workflow. The expanded query is the input for the next sub-workflow: document retrieval. We applied the document search service from the AIDA Toolkit, parameterized to use the regularly updated MedLine index that is stored on our AIDA server and updated daily. The output of this retrieval service is an xml document that contains elements of the retrieved documents, such as the PubMed identifier, title, abstract, and the authors. We then extract titles and abstracts for the next sub-workflow: i.e. protein name recognition. Sub-workflow 3 employs the AIDA Web Service 'applyCRF' to recognize protein (or gene) names in text. This service wraps a machine learning method based on the 'conditional random fields' approach [
<xref ref-type="bibr" rid="B23">23</xref>
]. In this case it uses a recognition model trained on protein/gene names. We added the aforementioned UniProt service again to mark the extracted results as genuine human, rat, or mouse protein/gene names. In a number of cases the workflow produced more than one identifier for a single protein name. This is due to the ambiguity in gene and protein names. For instance, Tuason
<italic>et al. </italic>
reported 6.6% ambiguous occurrences of mouse gene names in text, and percentages ranging from 2.4% to 32.9% depending on the organism [
<xref ref-type="bibr" rid="B24">24</xref>
]. The final step of our text mining procedure was to calculate a likelihood score for the extracted proteins to be found in documents retrieved through the expanded input query. We used a statistical method where the likelihood of finding a document with input query (q) and discovered protein name (d) is calculated by:
<inline-formula>
<inline-graphic xlink:href="1471-2105-10-S10-S9-i1.gif"></inline-graphic>
</inline-formula>
, in which
<italic>Q</italic>
,
<italic>D</italic>
, and
<italic>QD </italic>
are the frequencies of documents containing q, d, and q
<italic>and </italic>
d, respectively;
<italic>QD</italic>
<sub>
<italic>exp </italic>
</sub>
is the expected frequency of documents containing q and d assuming that their co-occurrence is a random event;
<italic>N </italic>
is the total number of documents in MedLine.</p>
<p>In parallel to the part of the workflow that performs the basic text mining procedure, we designed a set of 'semantic' sub-workflows to convert the text mining results to instances of the proto-ontologies and add these instances to the AIDA knowledge base, including their interrelations (steps s
<italic>N </italic>
in Figure
<xref ref-type="fig" rid="F6">6</xref>
). The first step of this procedure is to initialize this knowledge base after which the proto-ontologies are loaded into the knowledge base, and references to the knowledge base are available for the rest of the workflow. The next step is to add instances for the following entities to the knowledge base: 1) the initial biological model/hypothesis, 2) the original input query, 3) the protein names it contains, and 4) the expanded query. We assumed that the input query and the proteins mentioned therein partially represent the biological model; each run of the workflow creates a new instance of a biological model unless the input query is exactly the same as in a previous experiment. Figure
<xref ref-type="fig" rid="F7">7</xref>
illustrates the creation of an instance of a biological model and its addition to the knowledge base, including the details for creating the RDF triples in Java. All the semantic sub-workflows follow a similar pattern (data not shown). The following sub-worfklow adds instances for retrieved documents to the knowledge base; it only uses the PubMed identifier. The sub-workflow that adds discovered proteins is critical to our methodology. It creates protein term instances from protein names in the Text ontology to which it also adds the collocation relations with the original query a and a 'discovered_in' relation with the document it was discovered in. In addition, it creates protein instances in the BioModel ontology and a biological association relation to the proteins found in the input query. Between term and protein instances in the different ontologies it creates reference relations. As a result, our knowledge base is populated with the discoveries of the text mining procedure and their biological interpretations still linked with the knowledge they are interpretations of. The final sub-workflow adds the calculated likelihood scores as a property of the protein terms in the knowledge base. Finally, to be able to retrieve more complete evidence from the knowledge base, we extended our models and workflow to accommodate typical provenance data (not shown). We created an ontology with classes for Workflow runs and Web Service runs. Using the same semantic approach as above we were able to store instances of these runs, including the date and time of execution.</p>
<fig position="float" id="F7">
<label>Figure 7</label>
<caption>
<p>
<bold>Example RDF conversion workflow</bold>
. This workflow creates an OWL instance for a biological hypothesis in RDF 'N3' format, and adds the RDF triples to the AIDA knowledge base with the 'addRDF' operation of the AIDA repository Web Service. The actual conversion is performed in the Java Beanshell 'Instantiate_Semantic_Type' of which the code is shown at the bottom. The sub-workflow has the hypothesis instance as output for use by other sub-workflows in the main workflow.</p>
</caption>
<graphic xlink:href="1471-2105-10-S10-S9-7"></graphic>
</fig>
</sec>
<sec>
<title>Querying the knowledge base</title>
<p>The result of running the workflow is that our knowledge base is enriched with instances of biological concepts and relations between those instances that can also tell us why the instances were created. We can examine the results in search of unexpected findings or we can examine the evidence for certain findings, for instance by examining the documents in which some protein name was found. An interesting possibility is to explore relations between the results of computational experiments that added knowledge to the knowledge base. To prove this concept we ran the workflow twice, first with "HDAC1 AND chromatin" as input, and then with "(Nutrition OR food) AND (chromatin OR epigenetics) AND (protein OR proteins)" as input. We were then able to retrieve three proteins that are apparently shared between the two biological models (see Figure
<xref ref-type="fig" rid="F8">8</xref>
for the RDF query): NF-kappaB (UniProt ID P19838), p21 (UniProt ID P38936), and Bax (UniProt ID P97436). If we would like to investigate the evidence by which these proteins were discovered we designed a query that traces the chain of evidence (Figure
<xref ref-type="fig" rid="F9">9</xref>
). It retrieves the process by which the name of the protein was found, the service by which the process was implemented and its creator, the document from MedLine in which the protein name was discovered, and the time when this discovery service was run. For example, NF-KappaB was found on the 18
<sup>th </sup>
of November 2008 in a paper with PubMed identifier 17540846, by a run of the 'AIDA CRF Named Entity Recognition service' based on 'conditional random fields trained on protein names', created by Sophia Katrenko.</p>
<fig position="float" id="F8">
<label>Figure 8</label>
<caption>
<p>
<bold>Pseudo RDF query for extracting proteins related to two hypotheses</bold>
. RDF queries are pattern matching queries. This query returns proteins that were found by mining for relations with two different hypotheses represented by two different user queries. The result is a table of protein descriptions and the two user queries.</p>
</caption>
<graphic xlink:href="1471-2105-10-S10-S9-8"></graphic>
</fig>
<fig position="float" id="F9">
<label>Figure 9</label>
<caption>
<p>
<bold>Graphical representation of a 'chain of evidence query'</bold>
. This RDF query matches patterns in the RDF graph created by the knowledge extraction workflow. The result is a table of protein identifiers, protein names, the process by which the proteins were found, the service that implemented this process, the date and time it was run, its creator, and the document that the service used as input and of which the protein name was a component.</p>
</caption>
<graphic xlink:href="1471-2105-10-S10-S9-9"></graphic>
</fig>
</sec>
<sec>
<title>The AIDA Toolkit for knowledge extraction and knowledge management</title>
<p>The methodology that we propose enables a 'do-it-yourself' approach to extracting knowledge that can support hypothesis generation. To support this approach, we are developing an open source toolkit called Adaptive Information Disclosure Application (AIDA). AIDA is a generic set of components that can perform a variety of tasks related to knowledge extraction and knowledge management, such as perform specialized search on resource collections, learn new pattern recognition models, and store knowledge in a repository. W3C standards are used to make data accessible and manageable with Semantic Web technologies such as OWL, RDF(S), and SKOS. AIDA is also based on Lucene and Sesame. Most components are available as web services and are open source under an Apache license. AIDA is composed of three main modules: Search, Learning, and Storage.</p>
<sec>
<title>Search – the information retrieval module</title>
<p>AIDA provides components which enable retrieval from a set of documents given a query, similar to popular search engines such as Google, Yahoo!, or PubMed. To make a set of documents (a corpus) searchable, an 'index' needs to be created first [
<xref ref-type="bibr" rid="B25">25</xref>
]. For this the AIDA's configurable Indexer can be used. The Indexer and Search components are built upon Apache Lucene, version 2.1.0 [
<xref ref-type="bibr" rid="B26">26</xref>
], and, hence, indexes or other systems based on Lucene can easily be integrated with AIDA. The Indexer component takes care of the preprocessing (the conversion, tokenization, and possibly normalization) of the text of each document as well as the subsequent index generation. Different fields can be made retrievable such as title, document name, authors, or the entire contents. The currently supported document encodings are Microsoft Word, Portable Document Format (PDF), MedLine, XML, and plain text. The so-called "DocumentHandlers" which handle the actual conversion of each source file are loaded at runtime, so a handler for any other proprietary document encoding can be created and used instantly. Because Lucene is used as a basis, a plethora of options and/or languages are available for stemming, tokenization, normalization, or stop word removal which may all be set on a per-field, per-document type, or per-index basis using the configuration. An index can currently be constructed using either the command-line, a SOAP webservice (with the limitation of 1 document per call), or using a Taverna plugin.</p>
</sec>
<sec>
<title>Learning – the machine learning module</title>
<p>AIDA includes several components which enable information extraction from text data in the Learning module. These components are referred to as learning tools. The large community working on the information extraction task has already produced numerous data sets and tools to work with. To be able to use existing solutions, we incorporated some of the models trained on the large corpora into the named entity recognition web service NERecognizerService. These models are provided by LingPipe[
<xref ref-type="bibr" rid="B27">27</xref>
] and range from the very general named entity recognition (detecting locations, person and organization names) to the specific models in the biomedical field created to recognize protein names and other bio-entities. We specified several options for input/output, which gives us an opportunity to work with either text data or the output of the search engine Lucene. We also offer the LearnModel web service whose aim is to produce a model from annotated text data. A model is based on the contextual information and uses learning methods provided by Weka [
<xref ref-type="bibr" rid="B28">28</xref>
] libraries. Once such a model is created, it can be used by the TestModel web service to annotate texts in the same domain. In this paper we use an AIDA service that applies a service for an algorithm that uses sequential models, such as conditional random fields (CRFs)/CRFs have an advantage over Hiddem Markov Models because of their ability to relax the independence assumption by defining a conditional probability distribution over label sequences given an observation sequence. We used CRFs to detect named entities in several domains like acids of various lengths in the food informatics field or protein names in the biomedical field [
<xref ref-type="bibr" rid="B9">9</xref>
].</p>
<p>Named entity recognition constitutes only one subtask in information extraction. Relation extraction can be viewed as the logical next step after the named entity recognition is carried out [
<xref ref-type="bibr" rid="B29">29</xref>
]. This task can be decomposed into the detection of named entities, followed by the verification of a given relation among them. For example, given extracted protein names, it should possible to infer whether there is any interaction between two proteins. This task is accomplished by the RelationLearner web service. It uses an annotated corpus of relations to induce a model, which consequently can be applied to the test data with already detected named entities. The RelationLearner focuses on extraction of binary relations given the sentential context. Its output is a list of the named entities pairs, where the given relation holds.</p>
<p>The other relevant area for information extraction is detection of the collocations (or n-grams in the broader sense). This functionality is provided by the CollocationService which, given a folder with text documents, outputs the n-grams of the desired frequency and length.</p>
</sec>
<sec>
<title>Storage – the metadata storage module</title>
<p>AIDA includes components for the storage and processing of ontologies, vocabularies, and other structured metadata in the Storage module. The main component, also for the work described in this paper, is RepositoryWS, a service wrapper for Sesame – an open source framework for storage, inferencing and querying of RDF data on which most of this module's implementation is based [
<xref ref-type="bibr" rid="B30">30</xref>
,
<xref ref-type="bibr" rid="B31">31</xref>
]. ThesaurusRepositoryWS is an extension of RepositoryWS that provides convenient access methods for SKOS thesauri. The Sesame RDF repository offers an HTTP interface and a Java API. In order to be able to integrate Sesame into workflows we created a SOAP service that gives access to the Sesame Java API. We accommodate for extensions to other RDF repositories, such as the HP Jena, Virtuoso, Allegrograph repositories or future versions of Sesame, by implementing the Factory design pattern.</p>
</sec>
<sec>
<title>Complementary services from BioSemantics applications</title>
<p>One of the advantages of a workflow approach is the ability to include services created elsewhere in the scientific community ('collaboration by Web Services'). For instance, in our BioAID workflows operations are used for query expansion and validation of protein names by UniProt identifiers. AIDA is therefore complemented by services derived from text mining applications such as Anni 2.0 from the BioSemantics group [
<xref ref-type="bibr" rid="B32">32</xref>
]. The 'BioSemantics' group is particularly strong in disambiguation of the names of biological entities such as genes/proteins, intelligent biological query expansion (manuscript in preparation), and provision of several well known identifiers for biological entities through carefully compiled sets of names and identifiers around a biological concept.</p>
</sec>
<sec>
<title>User interfaces for AIDA</title>
<p>In addition to RDF manipulation within workflows as described in this document, several examples of user interactions have been made available in AIDA clients such as HTML web forms, AJAX web applications, and a Firefox toolbar. The clients access RepositoryWS for querying RDF through the provided Java Servlets. The web services in Storage have recently been updated from the Sesame 1.2 Java API to the Sesame 2.0 Java API. Some of the new features that Sesame 2.0 provides, such as SPARQL support and named graphs, are now being added to our web service API's and incorporated into our applications.</p>
</sec>
</sec>
</sec>
<sec>
<title>Discussion</title>
<p>Our methodology for supporting the generation of a hypothesis about a biomolecular mechanism is based on a combination of tools and expertise from the fields of Semantic Web, e-Science, information retrieval, and information extraction. This novel combination has a number of benefits. First, the use of RDF and OWL removes the technical obstacle for making models interoperable with other knowledge resources on the Semantic Web although semantic interoperability will often require an alignment process to take place for more far reaching compatibility. The modeling approach that we propose is complementary to the efforts of communities such as the Open Biomedical Ontology (OBO) community. This community's stated purpose is to create an 'accurate representation of biological reality' by developing comprehensive domain ontologies and reconciling existing ontologies according to a number of governing principles [
<xref ref-type="bibr" rid="B4">4</xref>
]. Our ambitions are more modest. We start with a minimal model to represent a hypothesis, i.e. a particular
<italic>model </italic>
of reality. We define our own classes and properties within the scope of a knowledge extraction experiment, but because of the modularity supported by OWL this does not exclude integration with other ontologies. In fact, integration with existing knowledge resources enables a complementary approach for finding facts potentially relevant to a hypothesis. Clearly, in order to scale up our methodology to represent knowledge beyond the experiments of a small group of researchers, alignment with standards would have to be considered. Upper ontologies can facilitate integration (for an example see [
<xref ref-type="bibr" rid="B33">33</xref>
]), and we can benefit from the OBO guidelines and the tools that have been developed to convert OBO ontologies to OWL [
<xref ref-type="bibr" rid="B33">33</xref>
-
<xref ref-type="bibr" rid="B35">35</xref>
]. Another interesting possibility is the integration with thesauri based on the SKOS framework [
<xref ref-type="bibr" rid="B36">36</xref>
]. Relations between SKOS concepts (terms) are defined by simple 'narrower' and 'broader' relations that turn out to be effective for human computer interfaces, and may be the best option for labeling the elements in our semantic models. Instead of providing a text string as a human readable label, we could associate an element with an entry in a SKOS thesaurus, which is a valuable knowledge resource in itself. The SKOS format is useful as an approach for 'light-weight' knowledge integration that avoids the problems of ontological over-commitment associated with more powerful logics like OWL DL [
<xref ref-type="bibr" rid="B37">37</xref>
].</p>
<p>A second benefit of our methodology comes from the implementation of the knowledge extraction procedure as a workflow. The procedure for populating an ontology is similar to the one previously described by Witte
<italic>et al. </italic>
[
<xref ref-type="bibr" rid="B38">38</xref>
], but our implementation allows the accumulation of knowledge by repeatedly running the same workflow or adaptations of it. This enables us to perform posterior analyses over the results from several experiments by querying the knowledge base, for instance in a new workflow that uses the AIDA semantic repository service. Moreover, the approach is not limited to text mining. If one considers text documents as a particular form of data, we can generalize the principle to any computational experiment in which the output can be related to a qualitative biological model. As such, this work extends previous work on integration of genome data via semantic annotation [
<xref ref-type="bibr" rid="B39">39</xref>
]. In this case the annotation is carried out by a workflow. Considering that there are thousands of Web Services and hundreds of workflows available for bioinformaticians [
<xref ref-type="bibr" rid="B17">17</xref>
], numerous extensions to our workflow can be explored. In addition, the combination with a semantic model allows us to collect evidence information as a type of knowledge provenance during workflow execution. In this way, we were able to address the issue of keeping a proper log of what has happened to our data during computational experimentation, analogous to the lab journal typically required in wet labs [
<xref ref-type="bibr" rid="B40">40</xref>
]. Ideally, the knowledge provenance captured in our approach would be more directly supported by existing workflow systems. However, this is not yet possible. There seems to be a knowledge gap between workflow investigators and the users from a particular application domain with regard to provenance. We propose that workflow systems take care of execution level provenance and provide an RDF interface on which users can build their own provenance model. In this context, it will be interesting to see if we will be able to replace our workflow model and link directly to the light weight provenance model that is being implemented for Taverna 2 [
<xref ref-type="bibr" rid="B41">41</xref>
]. A third benefit is that the application of Semantic Web, Web Services, and workflows stored on myExperiment.org, allow all resources relevant to an experiment to be shared on the web, making our results more reproducible. We would like to increase the 'liquidity' of knowledge so that knowledge extracted from computational experiments can eventually fit into frameworks for scientific discourse (hypotheses, research statements and questions, etc.) such as Semantic Web Applications in Neuromedicine (SWAN) [
<xref ref-type="bibr" rid="B42">42</xref>
]. If it is to be global, interoperability across modes of discourse would require large scale consensus on how to express knowledge provenance, not only about knowledge produced from computational experiments but also from manual or human assertions. Some groups are attempting to address various aspects of this problem, such as the Scientific Discourse task force [
<xref ref-type="bibr" rid="B43">43</xref>
] in the W3C Semantic Web Health Care and Life Sciences Interest Group [
<xref ref-type="bibr" rid="B44">44</xref>
], the Concept Web Alliance [
<xref ref-type="bibr" rid="B45">45</xref>
] and the Shared Names initiative [
<xref ref-type="bibr" rid="B46">46</xref>
].</p>
</sec>
<sec>
<title>Conclusion</title>
<p>In this paper we demonstrate a methodology for a 'do it yourself' approach for the extraction and management of knowledge in support of generating hypotheses about biomolecular mechanisms. Our approach describes how one can create a personal model for a specific hypothesis and how a personal 'computational experiment' can be designed and executed to extract knowledge from literature and populate a knowledge base. A significant advantage of the methodology is the possibility it creates to perform analyses across the results of several of these knowledge extraction experiments. Moreover, the principle of semantic disclosure of results from a computational experiment is not limited to text mining. In principle, it can be applied to any kind of experiment of which the (interpretations of) results can be converted to semantic models, almost as a 'side effect' of the experiment at hand. Experimental data is automatically semantically annotated which makes it manageable within the context of its purpose: biological study. We consider this an intuitive and flexible way of enabling the reuse of data. With the use of Web Services from the AIDA Toolkit and others, we also demonstrated the exploitation of the expertise of computational scientists with diverse backgrounds, i.e. where knowledge sharing takes place at the level of services and qualitative models. We consider the demonstration of e-Science and Semantic Web tools for a personalized approach in the context of scientific communities to be one of the main contributions of our methodology. In summary, the methodology provides a basis for automated support for hypothesis formation in the context of experimental science. Future extensions will be driven by biological studies on specific biomolecular mechanisms such as the role of histone modifications in transcription. We also plan to evaluate general strategies for extracting novel ideas from a growing repository of structured knowledge.</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<title>Authors' contributions</title>
<p>Marco Roos, M. Scott Marshall, and Pieter Adriaans conceived the BioAID concept and scenario. Marco Roos, Andrew Gibson and M. Scott Marshall conceived the semantic modeling approach. Marco Roos created the ontological models and implemented the workflow. M. Scott Marshall coordinated the development of AIDA. Martijn Schuemie, Edgar Meij, Sophia Katrenko, and Willem van Hage and Konstantinos Krommydas, developed the synonym/UniProt service, the document retrieval service, the protein extraction service, and the semantic repository service respectively. All authors contributed to the overall development of our methodology.</p>
</sec>
</body>
<back>
<ack>
<sec>
<title>Acknowledgements</title>
<p>We thank the myGrid team and OMII-UK for their support in applying their e-Science tools, and Machiel Jansen for his contribution to the early development of AIDA. This work was carried out in the context of the Virtual Laboratory for e-Science program (VL-e) and the BioRange program. These programs are supported by BSIK grants from the Dutch Ministry of Education, Culture and Science (OC&W). Special thanks go to Bob Hertzberger who made the VL-e project a reality.</p>
<p>This article has been published as part of
<italic>BMC Bioinformatics </italic>
Volume 10 Supplement 10, 2009: Semantic Web Applications and Tools for Life Sciences, 2008. The full contents of the supplement are available online at
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2105/10?issue=S10"></ext-link>
.</p>
</sec>
</ack>
<ref-list>
<ref id="B1">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Antoniou</surname>
<given-names>G</given-names>
</name>
<name>
<surname>van Harmelen</surname>
<given-names>F</given-names>
</name>
</person-group>
<source>A Semantic Web Primer</source>
<year>2004</year>
<publisher-name>MIT Press</publisher-name>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Neumann</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>A life science Semantic Web: are we there yet?</article-title>
<source>Sci STKE</source>
<year>2005</year>
<volume>2005</volume>
<fpage>pe22</fpage>
<pub-id pub-id-type="pmid">15886389</pub-id>
<pub-id pub-id-type="doi">10.1126/stke.2832005pe22</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rubin</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>NH</given-names>
</name>
<name>
<surname>Noy</surname>
<given-names>NF</given-names>
</name>
</person-group>
<article-title>Biomedical ontologies: a functional perspective</article-title>
<source>Brief Bioinform</source>
<year>2008</year>
<volume>9</volume>
<fpage>75</fpage>
<lpage>90</lpage>
<pub-id pub-id-type="pmid">18077472</pub-id>
<pub-id pub-id-type="doi">10.1093/bib/bbm059</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rosse</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bard</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bug</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Ceusters</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Goldberg</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Eilbeck</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ireland</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mungall</surname>
<given-names>CJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration</article-title>
<source>Nature biotechnology</source>
<year>2007</year>
<volume>25</volume>
<fpage>1251</fpage>
<lpage>1255</lpage>
<pub-id pub-id-type="pmid">17989687</pub-id>
<pub-id pub-id-type="doi">10.1038/nbt1346</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stein</surname>
<given-names>LD</given-names>
</name>
</person-group>
<article-title>Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges</article-title>
<source>Nature reviews</source>
<year>2008</year>
<volume>9</volume>
<fpage>678</fpage>
<lpage>688</lpage>
<pub-id pub-id-type="pmid">18714290</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Mungall</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>Lewis</surname>
<given-names>SE</given-names>
</name>
</person-group>
<article-title>Ontologies for biologists: a community model for the annotation of genomic data</article-title>
<source>Cold Spring Harbor symposia on quantitative biology</source>
<year>2003</year>
<volume>68</volume>
<fpage>227</fpage>
<lpage>235</lpage>
<pub-id pub-id-type="pmid">15338622</pub-id>
<pub-id pub-id-type="doi">10.1101/sqb.2003.68.227</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spasic</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Ananiadou</surname>
<given-names>S</given-names>
</name>
<name>
<surname>McNaught</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Text mining and ontologies in biomedicine: making sense of raw text</article-title>
<source>Brief Bioinform</source>
<year>2005</year>
<volume>6</volume>
<fpage>239</fpage>
<lpage>251</lpage>
<pub-id pub-id-type="pmid">16212772</pub-id>
<pub-id pub-id-type="doi">10.1093/bib/6.3.239</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weeber</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kors</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Mons</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Online tools to support literature-based discovery in the life sciences</article-title>
<source>Brief Bioinform</source>
<year>2005</year>
<volume>6</volume>
<fpage>277</fpage>
<lpage>286</lpage>
<pub-id pub-id-type="pmid">16212775</pub-id>
<pub-id pub-id-type="doi">10.1093/bib/6.3.277</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Katrenko</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Adriaans</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Using Semi-Supervised Techniques to Detect Gene Mentions</article-title>
<source>Second BioCreative Challenge Workshop: 2007</source>
<year>2007</year>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gomez-Perez</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Manzano-Macho</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>An overview of methods and tools for ontology learning from texts</article-title>
<source>Knowledge Engineering Review</source>
<year>2004</year>
<volume>19</volume>
<fpage>187</fpage>
<lpage>212</lpage>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Missikoff</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Velardi</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Fabriani</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Text mining techniques to automatically enrich a domain ontology</article-title>
<source>Applied Intelligence</source>
<year>2003</year>
<volume>18</volume>
<fpage>323</fpage>
<lpage>340</lpage>
<pub-id pub-id-type="doi">10.1023/A:1023254205945</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goble</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hull</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wolstencroft</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Lopez</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Data curation + process curation = data integration + science</article-title>
<source>Brief Bioinform</source>
<year>2008</year>
<volume>9</volume>
<fpage>506</fpage>
<lpage>517</lpage>
<pub-id pub-id-type="pmid">19060304</pub-id>
<pub-id pub-id-type="doi">10.1093/bib/bbn034</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hull</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wolstencroft</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Goble</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Pocock</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Oinn</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Taverna: a tool for building and running workflows of services</article-title>
<source>Nucl Acids Res</source>
<year>2006</year>
<fpage>W729</fpage>
<lpage>W732</lpage>
<pub-id pub-id-type="pmid">16845108</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkl320</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Inda</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>van Batenburg</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Roos</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Belloum</surname>
<given-names>AS</given-names>
</name>
<name>
<surname>Vasunin</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wibisono</surname>
<given-names>A</given-names>
</name>
<name>
<surname>van Kampen</surname>
<given-names>AH</given-names>
</name>
<name>
<surname>Breit</surname>
<given-names>TM</given-names>
</name>
</person-group>
<article-title>SigWin-detector: a Grid-enabled workflow for discovering enriched windows of genomic features related to DNA sequences</article-title>
<source>BMC research notes</source>
<year>2008</year>
<volume>1</volume>
<fpage>63</fpage>
<pub-id pub-id-type="pmid">18710516</pub-id>
<pub-id pub-id-type="doi">10.1186/1756-0500-1-63</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Romano</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Automation of in-silico data analysis processes through workflow management systems</article-title>
<source>Brief Bioinform</source>
<year>2008</year>
<volume>9</volume>
<fpage>57</fpage>
<lpage>68</lpage>
<pub-id pub-id-type="pmid">18056132</pub-id>
<pub-id pub-id-type="doi">10.1093/bib/bbm056</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fisher</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hedeler</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wolstencroft</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Hulme</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Noyes</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kemp</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Brass</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>A systematic strategy for large-scale analysis of genotype phenotype correlations: identification of candidate genes involved in African trypanosomiasis</article-title>
<source>Nucleic Acids Res</source>
<year>2007</year>
<volume>35</volume>
<fpage>5625</fpage>
<lpage>5633</lpage>
<pub-id pub-id-type="pmid">17709344</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkm623</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Goble</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>De Roure</surname>
<given-names>DC</given-names>
</name>
</person-group>
<article-title>myExperiment: social networking for workflow-using e-scientists</article-title>
<source>2nd workshop on Workflows in support of large-scale science</source>
<year>2007</year>
<publisher-name>Monterey, California, USA: ACM Press</publisher-name>
<fpage>1</fpage>
<lpage>2</lpage>
</citation>
</ref>
<ref id="B18">
<citation citation-type="other">
<article-title>Web Services Description Language (WSDL) 1.1</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.w3.org/TR/wsdl"></ext-link>
</citation>
</ref>
<ref id="B19">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Meij</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>IJzereef</surname>
<given-names>LHL</given-names>
</name>
<name>
<surname>Azzopardi</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Kamps</surname>
<given-names>J</given-names>
</name>
<name>
<surname>de Rijke</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Voorhees</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>P</surname>
<given-names>BL</given-names>
</name>
</person-group>
<article-title>Combining Thesauri-based Methods for Biomedical Retrieval</article-title>
<source>The Fourteenth Text REtrieval Conference (TREC 2005) National Institute of Standards and Technology</source>
<year>2006</year>
<publisher-name>NIST Special Publication</publisher-name>
</citation>
</ref>
<ref id="B20">
<citation citation-type="other">
<article-title>Supporting material for this paper on myExperiment (pack 58)</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.myexperiment.org/packs/58"></ext-link>
</citation>
</ref>
<ref id="B21">
<citation citation-type="other">
<article-title>The Taverna workbench</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.mygrid.org.uk/tools/taverna/"></ext-link>
</citation>
</ref>
<ref id="B22">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Kors</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Schuemie</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Schijvenaars</surname>
<given-names>BJA</given-names>
</name>
<name>
<surname>Weeber</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Mons</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Combination of Genetic Databases for Improving Identification of Genes and Proteins in Text</article-title>
<source>BioLINK: 2005; Detroit, Michigan, USA</source>
<year>2005</year>
</citation>
</ref>
<ref id="B23">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Katrenko</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Adriaans</surname>
<given-names>PW</given-names>
</name>
</person-group>
<article-title>Using Semi-Supervised Techniques to Detect Gene Mentions</article-title>
<source>Second BioCreative Challenge Workshop: 2007; Madrid, Spain</source>
<year>2007</year>
</citation>
</ref>
<ref id="B24">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Tuason</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Blake</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Friedman</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Biological Nomenclatures: A source of Lexical Knowledge and Ambiguity</article-title>
<source>Pacific Symposium on Biocomputing Fairmont Orchid, Hawaii</source>
<year>2004</year>
</citation>
</ref>
<ref id="B25">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Manning</surname>
<given-names>CD</given-names>
</name>
<name>
<surname>Raghavan</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schütze</surname>
<given-names>H</given-names>
</name>
</person-group>
<source>Introduction to Information Retrieval</source>
<year>2004</year>
<publisher-name>Cambridge University Press</publisher-name>
</citation>
</ref>
<ref id="B26">
<citation citation-type="other">
<article-title>The Apache Lucene project</article-title>
<ext-link ext-link-type="uri" xlink:href="http://lucene.apache.org"></ext-link>
</citation>
</ref>
<ref id="B27">
<citation citation-type="other">
<article-title>LingPipe</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.alias-i.com/lingpipe/"></ext-link>
</citation>
</ref>
<ref id="B28">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Witten</surname>
<given-names>IH</given-names>
</name>
<name>
<surname>Frank</surname>
<given-names>E</given-names>
</name>
</person-group>
<source>Data Mining: Practical machine learning tools and techniques</source>
<year>2005</year>
<edition>2</edition>
<publisher-name>San Francisco: Morgan Kaufmann</publisher-name>
</citation>
</ref>
<ref id="B29">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Katrenko</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Adriaans</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Learning Relations from Biomedical Corpora Using Dependency Trees</article-title>
<source>KDECB (Knowledge Discovery and Emergent Complexity in BioInformatics): 2006</source>
<year>2006</year>
</citation>
</ref>
<ref id="B30">
<citation citation-type="other">
<article-title>Sesame Open Source community web site</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.openrdf.org/"></ext-link>
</citation>
</ref>
<ref id="B31">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Broekstra</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kampman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>van Harmelen</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema</article-title>
<source>The Semantic Web – ISWC 2002: First International Semantic Web Conference</source>
<year>2002</year>
<volume>2342/2002</volume>
<publisher-name>Sardinia, Italy: Springer Berlin/Heidelberg</publisher-name>
<fpage>54</fpage>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jelier</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Schuemie</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Veldhoven</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Dorssers</surname>
<given-names>LC</given-names>
</name>
<name>
<surname>Jenster</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Kors</surname>
<given-names>JA</given-names>
</name>
</person-group>
<article-title>Anni 2.0: a multipurpose text-mining tool for the life sciences</article-title>
<source>Genome Biology</source>
<year>2008</year>
<volume>9</volume>
<fpage>R96</fpage>
<pub-id pub-id-type="pmid">18549479</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2008-9-6-r96</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoehndorf</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Loebe</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Kelso</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Herre</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Representing default knowledge in biomedical ontologies: application to the integration of anatomy and phenotype ontologies</article-title>
<source>BMC Bioinformatics</source>
<year>2007</year>
<volume>8</volume>
<fpage>377</fpage>
<pub-id pub-id-type="pmid">17925014</pub-id>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-377</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moreira</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Musen</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>OBO to OWL: a protege OWL tab to read/save OBO ontologies</article-title>
<source>Bioinformatics</source>
<year>2007</year>
<volume>23</volume>
<fpage>1868</fpage>
<lpage>1870</lpage>
<pub-id pub-id-type="pmid">17496317</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btm258</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mungall</surname>
<given-names>CJ</given-names>
</name>
</person-group>
<article-title>Obol: integrating language and meaning in bio-ontologies</article-title>
<source>Comparative and functional genomics</source>
<year>2004</year>
<volume>5</volume>
<fpage>509</fpage>
<lpage>520</lpage>
<pub-id pub-id-type="pmid">18629143</pub-id>
<pub-id pub-id-type="doi">10.1002/cfg.435</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="other">
<article-title>SKOS Simple Knowledge Organization System Reference</article-title>
</citation>
</ref>
<ref id="B37">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Jupp</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bechhofer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Yesilada</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kostkova</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Knowledge Representation for Web Navigation</article-title>
<source>Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2008) Edinburgh</source>
<year>2008</year>
</citation>
</ref>
<ref id="B38">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Witte</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Kappler</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>CJO</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Baker CJO, Cheung K-H</surname>
</name>
</person-group>
<article-title>Ontology Design for Biomedical Text Mining</article-title>
<source>Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences</source>
<year>2007</year>
<publisher-name>New York: Springer Science+Business Media</publisher-name>
<fpage>281</fpage>
<lpage>313</lpage>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Post</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Roos</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Marshall</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>van Driel</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Breit</surname>
<given-names>TM</given-names>
</name>
</person-group>
<article-title>A semantic web approach applied to integrative bioinformatics experimentation: a biological use case with genomics data</article-title>
<source>Bioinformatics</source>
<year>2007</year>
<volume>23</volume>
<fpage>3080</fpage>
<lpage>3087</lpage>
<pub-id pub-id-type="pmid">17881406</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btm461</pub-id>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Goble</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Using provenance to manage knowledge of in silico experiments</article-title>
<source>Brief Bioinform</source>
<year>2007</year>
<volume>8</volume>
<fpage>183</fpage>
<lpage>194</lpage>
<pub-id pub-id-type="pmid">17502335</pub-id>
<pub-id pub-id-type="doi">10.1093/bib/bbm015</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Missier</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Belhajjame</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Goble</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Data lineage model for Taverna workflows with lightweight annotation requirements</article-title>
<source>IPAW'08 Salt Lake City, Utah</source>
<year>2008</year>
</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Clark</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kinoshita</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Alzforum and SWAN: the present and future of scientific web communities</article-title>
<source>Brief Bioinform</source>
<year>2007</year>
<volume>8</volume>
<fpage>163</fpage>
<lpage>171</lpage>
<pub-id pub-id-type="pmid">17510163</pub-id>
<pub-id pub-id-type="doi">10.1093/bib/bbm012</pub-id>
</citation>
</ref>
<ref id="B43">
<citation citation-type="other">
<article-title>Scientific Discourse task group of W3C Health Care and Life Science Interest Group</article-title>
<ext-link ext-link-type="uri" xlink:href="http://esw.w3.org/topic/HCLSIG/SWANSIOC"></ext-link>
</citation>
</ref>
<ref id="B44">
<citation citation-type="other">
<article-title>W3C Semantic Web Health Care and Life Science Interest Group</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.w3.org/2001/sw/hcls/"></ext-link>
</citation>
</ref>
<ref id="B45">
<citation citation-type="other">
<article-title>Concept Web Alliance</article-title>
<ext-link ext-link-type="uri" xlink:href="http://conceptweblog.wordpress.com/about/"></ext-link>
</citation>
</ref>
<ref id="B46">
<citation citation-type="other">
<article-title>Shared Names Initiative</article-title>
<ext-link ext-link-type="uri" xlink:href="http://sharedname.org"></ext-link>
</citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Verschure</surname>
<given-names>PJ</given-names>
</name>
</person-group>
<article-title>Chromosome organization and gene control: it is difficult to see the picture when you are inside the frame</article-title>
<source>Journal of cellular biochemistry</source>
<year>2006</year>
<volume>99</volume>
<fpage>23</fpage>
<lpage>34</lpage>
<pub-id pub-id-type="pmid">16795053</pub-id>
<pub-id pub-id-type="doi">10.1002/jcb.20957</pub-id>
</citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000210  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000210  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024