Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications

Identifieur interne : 000281 ( Pmc/Corpus ); précédent : 000280; suivant : 000282

WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications

Auteurs : Qian Zhu ; Michael S. Lajiness ; Ying Ding ; David J. Wild

Source :

RBID : PMC:2933596

Abstract

Background

In recent years, there has been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery. However, there is a distinct lack of data mining tools available to harness this information, and in particular for knowledge discovery across multiple information sources. At Indiana University we have an ongoing project with Eli Lilly to develop web-service based tools for integrative mining of chemical and biological information. In this paper, we report on the first of these tools, called WENDI (Web Engine for Non-obvious Drug Information) that attempts to find non-obvious relationships between a query compound and scholarly publications, biological properties, genes and diseases using multiple information sources.

Results

We have created an aggregate web service that takes a query compound as input, calls multiple web services for computation and database search, and returns an XML file that aggregates this information. We have also developed a client application that provides an easy-to-use interface to this web service. Both the service and client are publicly available.

Conclusions

Initial testing indicates this tool is useful in identifying potential biological applications of compounds that are not obvious, and in identifying corroborating and conflicting information from multiple sources. We encourage feedback on the tool to help us refine it further. We are now developing further tools based on this model.


Url:
DOI: 10.1186/1758-2946-2-6
PubMed: 20727184
PubMed Central: 2933596

Links to Exploration step

PMC:2933596

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications</title>
<author>
<name sortKey="Zhu, Qian" sort="Zhu, Qian" uniqKey="Zhu Q" first="Qian" last="Zhu">Qian Zhu</name>
<affiliation>
<nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lajiness, Michael S" sort="Lajiness, Michael S" uniqKey="Lajiness M" first="Michael S" last="Lajiness">Michael S. Lajiness</name>
<affiliation>
<nlm:aff id="I2">Eli Lilly and Company, Indianapolis, IN, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ding, Ying" sort="Ding, Ying" uniqKey="Ding Y" first="Ying" last="Ding">Ying Ding</name>
<affiliation>
<nlm:aff id="I3">School of Library & Information Science, Indiana University, Bloomington, IN, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wild, David J" sort="Wild, David J" uniqKey="Wild D" first="David J" last="Wild">David J. Wild</name>
<affiliation>
<nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN, USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">20727184</idno>
<idno type="pmc">2933596</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2933596</idno>
<idno type="RBID">PMC:2933596</idno>
<idno type="doi">10.1186/1758-2946-2-6</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000281</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications</title>
<author>
<name sortKey="Zhu, Qian" sort="Zhu, Qian" uniqKey="Zhu Q" first="Qian" last="Zhu">Qian Zhu</name>
<affiliation>
<nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lajiness, Michael S" sort="Lajiness, Michael S" uniqKey="Lajiness M" first="Michael S" last="Lajiness">Michael S. Lajiness</name>
<affiliation>
<nlm:aff id="I2">Eli Lilly and Company, Indianapolis, IN, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ding, Ying" sort="Ding, Ying" uniqKey="Ding Y" first="Ying" last="Ding">Ying Ding</name>
<affiliation>
<nlm:aff id="I3">School of Library & Information Science, Indiana University, Bloomington, IN, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wild, David J" sort="Wild, David J" uniqKey="Wild D" first="David J" last="Wild">David J. Wild</name>
<affiliation>
<nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN, USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of Cheminformatics</title>
<idno type="eISSN">1758-2946</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>In recent years, there has been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery. However, there is a distinct lack of data mining tools available to harness this information, and in particular for knowledge discovery across multiple information sources. At Indiana University we have an ongoing project with Eli Lilly to develop web-service based tools for integrative mining of chemical and biological information. In this paper, we report on the first of these tools, called WENDI (Web Engine for Non-obvious Drug Information) that attempts to find non-obvious relationships between a query compound and scholarly publications, biological properties, genes and diseases using multiple information sources.</p>
</sec>
<sec>
<title>Results</title>
<p>We have created an aggregate web service that takes a query compound as input, calls multiple web services for computation and database search, and returns an XML file that aggregates this information. We have also developed a client application that provides an easy-to-use interface to this web service. Both the service and client are publicly available.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Initial testing indicates this tool is useful in identifying potential biological applications of compounds that are not obvious, and in identifying corroborating and conflicting information from multiple sources. We encourage feedback on the tool to help us refine it further. We are now developing further tools based on this model.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Mullin, R" uniqKey="Mullin R">R Mullin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Belleau, F" uniqKey="Belleau F">F Belleau</name>
</author>
<author>
<name sortKey="Nolin, Ma" uniqKey="Nolin M">MA Nolin</name>
</author>
<author>
<name sortKey="Tourigny, N" uniqKey="Tourigny N">N Tourigny</name>
</author>
<author>
<name sortKey="Rigault, P" uniqKey="Rigault P">P Rigault</name>
</author>
<author>
<name sortKey="Morissette, J" uniqKey="Morissette J">J Morissette</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wild, Dj" uniqKey="Wild D">DJ Wild</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dong, X" uniqKey="Dong X">X Dong</name>
</author>
<author>
<name sortKey="Gilbert, Ke" uniqKey="Gilbert K">KE Gilbert</name>
</author>
<author>
<name sortKey="Guha, R" uniqKey="Guha R">R Guha</name>
</author>
<author>
<name sortKey="Heiland, R" uniqKey="Heiland R">R Heiland</name>
</author>
<author>
<name sortKey="Kim, J" uniqKey="Kim J">J Kim</name>
</author>
<author>
<name sortKey="Pierce, Me" uniqKey="Pierce M">ME Pierce</name>
</author>
<author>
<name sortKey="Fox, Gc" uniqKey="Fox G">GC Fox</name>
</author>
<author>
<name sortKey="Wild, Dj" uniqKey="Wild D">DJ Wild</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hur, J" uniqKey="Hur J">J Hur</name>
</author>
<author>
<name sortKey="Wild, Dj" uniqKey="Wild D">DJ Wild</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Willighagen, E" uniqKey="Willighagen E">E Willighagen</name>
</author>
<author>
<name sortKey="O Boyle, Nm" uniqKey="O Boyle N">NM O'Boyle</name>
</author>
<author>
<name sortKey="Gopalakrishnan, H" uniqKey="Gopalakrishnan H">H Gopalakrishnan</name>
</author>
<author>
<name sortKey="Jiao, D" uniqKey="Jiao D">D Jiao</name>
</author>
<author>
<name sortKey="Guha, R" uniqKey="Guha R">R Guha</name>
</author>
<author>
<name sortKey="Steinbeck, C" uniqKey="Steinbeck C">C Steinbeck</name>
</author>
<author>
<name sortKey="Wild, Dj" uniqKey="Wild D">DJ Wild</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, H" uniqKey="Wang H">H Wang</name>
</author>
<author>
<name sortKey="Klinginsmith, J" uniqKey="Klinginsmith J">J Klinginsmith</name>
</author>
<author>
<name sortKey="Dong, X" uniqKey="Dong X">X Dong</name>
</author>
<author>
<name sortKey="Lee, Ac" uniqKey="Lee A">AC Lee</name>
</author>
<author>
<name sortKey="Guha, R" uniqKey="Guha R">R Guha</name>
</author>
<author>
<name sortKey="Wu, Y" uniqKey="Wu Y">Y Wu</name>
</author>
<author>
<name sortKey="Crippen, Gm" uniqKey="Crippen G">GM Crippen</name>
</author>
<author>
<name sortKey="Wild, Dj" uniqKey="Wild D">DJ Wild</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wild, Dj" uniqKey="Wild D">DJ Wild</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dong, X" uniqKey="Dong X">X Dong</name>
</author>
<author>
<name sortKey="Wild, Dj" uniqKey="Wild D">DJ Wild</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ballester, Pj" uniqKey="Ballester P">PJ Ballester</name>
</author>
<author>
<name sortKey="Richards, Wg" uniqKey="Richards W">WG Richards</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Durant, Jl" uniqKey="Durant J">JL Durant</name>
</author>
<author>
<name sortKey="Leland, Ba" uniqKey="Leland B">BA Leland</name>
</author>
<author>
<name sortKey="Henry, Dr" uniqKey="Henry D">DR Henry</name>
</author>
<author>
<name sortKey="Nourse, Jg" uniqKey="Nourse J">JG Nourse</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Swamidass, Sj" uniqKey="Swamidass S">SJ Swamidass</name>
</author>
<author>
<name sortKey="Baldi, P" uniqKey="Baldi P">P Baldi</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, Ma" uniqKey="Johnson M">MA Johnson</name>
</author>
<author>
<name sortKey="Maggiora, Gm" uniqKey="Maggiora G">GM Maggiora</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cramer, Gm" uniqKey="Cramer G">GM Cramer</name>
</author>
<author>
<name sortKey="Ford, Ra" uniqKey="Ford R">RA Ford</name>
</author>
<author>
<name sortKey="Hall, Rl" uniqKey="Hall R">RL Hall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, B" uniqKey="Chen B">B Chen</name>
</author>
<author>
<name sortKey="Dong, X" uniqKey="Dong X">X Dong</name>
</author>
<author>
<name sortKey="Jiao, D" uniqKey="Jiao D">D Jiao</name>
</author>
<author>
<name sortKey="Wang, H" uniqKey="Wang H">H Wang</name>
</author>
<author>
<name sortKey="Zhu, Q" uniqKey="Zhu Q">Q Zhu</name>
</author>
<author>
<name sortKey="Ding, Y" uniqKey="Ding Y">Y Ding</name>
</author>
<author>
<name sortKey="Wild, Dj" uniqKey="Wild D">DJ Wild</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="product-review">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">J Cheminform</journal-id>
<journal-title-group>
<journal-title>Journal of Cheminformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1758-2946</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">20727184</article-id>
<article-id pub-id-type="pmc">2933596</article-id>
<article-id pub-id-type="publisher-id">1758-2946-2-6</article-id>
<article-id pub-id-type="doi">10.1186/1758-2946-2-6</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Software</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" id="A1">
<name>
<surname>Zhu</surname>
<given-names>Qian</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>qianzhu@indiana.edu</email>
</contrib>
<contrib contrib-type="author" id="A2">
<name>
<surname>Lajiness</surname>
<given-names>Michael S</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>lajiness_michael_s@lilly.com</email>
</contrib>
<contrib contrib-type="author" id="A3">
<name>
<surname>Ding</surname>
<given-names>Ying</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>dingying@indiana.edu</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A4">
<name>
<surname>Wild</surname>
<given-names>David J</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>djwild@indiana.edu</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
School of Informatics and Computing, Indiana University, Bloomington, IN, USA</aff>
<aff id="I2">
<label>2</label>
Eli Lilly and Company, Indianapolis, IN, USA</aff>
<aff id="I3">
<label>3</label>
School of Library & Information Science, Indiana University, Bloomington, IN, USA</aff>
<pub-date pub-type="collection">
<year>2010</year>
</pub-date>
<pub-date pub-type="epub">
<day>20</day>
<month>8</month>
<year>2010</year>
</pub-date>
<volume>2</volume>
<fpage>6</fpage>
<lpage>6</lpage>
<history>
<date date-type="received">
<day>22</day>
<month>6</month>
<year>2010</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>8</month>
<year>2010</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright ©2010 Zhu et al; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2010</copyright-year>
<copyright-holder>Zhu et al; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.jcheminf.com/content/2/1/6"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>In recent years, there has been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery. However, there is a distinct lack of data mining tools available to harness this information, and in particular for knowledge discovery across multiple information sources. At Indiana University we have an ongoing project with Eli Lilly to develop web-service based tools for integrative mining of chemical and biological information. In this paper, we report on the first of these tools, called WENDI (Web Engine for Non-obvious Drug Information) that attempts to find non-obvious relationships between a query compound and scholarly publications, biological properties, genes and diseases using multiple information sources.</p>
</sec>
<sec>
<title>Results</title>
<p>We have created an aggregate web service that takes a query compound as input, calls multiple web services for computation and database search, and returns an XML file that aggregates this information. We have also developed a client application that provides an easy-to-use interface to this web service. Both the service and client are publicly available.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Initial testing indicates this tool is useful in identifying potential biological applications of compounds that are not obvious, and in identifying corroborating and conflicting information from multiple sources. We encourage feedback on the tool to help us refine it further. We are now developing further tools based on this model.</p>
</sec>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>In common with most scientific disciplines, there has in the last few years been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery, owing to a variety of factors including improvements in experimental technologies (High Throughput Screening, Microarray Assays, etc), improvements in computer technologies (particularly the Web), funded "grand challenge" projects (such as the Human Genome Project), an imperative to find more treatments for more diseases in an aging population, and various cultural shifts. This has been dubbed data overload [
<xref ref-type="bibr" rid="B1">1</xref>
] Significant effort has therefore been put into the development of computational methods for exploiting this information for drug discovery, particularly through the fields of Bioinformatics and Cheminformatics. Of particular note are the provision of large-scale chemical and biological databases, such as PubChem [
<xref ref-type="bibr" rid="B2">2</xref>
], ChemSpider [
<xref ref-type="bibr" rid="B3">3</xref>
], the PDB [
<xref ref-type="bibr" rid="B4">4</xref>
], and KEGG [
<xref ref-type="bibr" rid="B5">5</xref>
], which house information about massive numbers of compounds, proteins, sequences, assays and pathways; the development of predictive models for biological activity and other biological endpoints; data mining of chemical and biological data points; the availability of journal articles in electronic form, and associated indexing (such as in PubMed) and text mining of their content. Further, we are seeing an unprecedented amount of linking of information resources, for instance with Bio2RDF [
<xref ref-type="bibr" rid="B6">6</xref>
], Linking Open Drug Data [
<xref ref-type="bibr" rid="B7">7</xref>
] and manual linking of database entries.</p>
<p>One of the next great challenges is how we can use all of this information together in an intelligent way, in an
<italic>integrative </italic>
fashion [
<xref ref-type="bibr" rid="B8">8</xref>
]. We can think of all these information resources as pieces of a jigsaw, which in their own right give us useful insights, but to get the full picture requires the pieces to be put together in the right fashion. We thus not only need to aggregate the information, but we also need to be able to data mine it in an integrative fashion. There are a number of technologies that are becoming available that assist with this: in particular, web services and Cyberinfrastructure [
<xref ref-type="bibr" rid="B9">9</xref>
] allow straightforward, standardized interfaces to a variety of data sources and Semantic Web languages such as XML, OWL and RDF permit the aggregation of data, and representation of meaning and relationships in the data respectively.</p>
<p>At Indiana University, we are tackling this problem from several angles. We recently developed a Cyberinfrastructure for cheminformatics, called
<italic>ChemBioGrid</italic>
, which has made a multitude of databases and computational tools freely available for the first time to the academic community in a web service framework [
<xref ref-type="bibr" rid="B10">10</xref>
]. Of particular import, we have been able to successfully index chemical structures in the abstracts of large numbers of scholarly publications through a collaboration with the Murray Rust group at Cambridge. The infrastructure has spurred the development of several important client applications, including PubChemSR [
<xref ref-type="bibr" rid="B11">11</xref>
], and the application of Web 2.0 style "mashups" using userscripts for a variety of life-science applications [
<xref ref-type="bibr" rid="B12">12</xref>
]. We are continuing to support and further develop this infrastructure.</p>
<p>With this infrastructure in place, we have investigated a variety of strategies for integrating the chemical and biological data from different sources in the infrastructure, in particular of (i) the application of data mining techniques to chemical structure, biological activity and gene expression data in an integrated fashion [
<xref ref-type="bibr" rid="B13">13</xref>
], (ii) the development of a generalizable four layer model (storage, interface, aggregation and smart client) for integrative data mining and knowledge discovery [
<xref ref-type="bibr" rid="B14">14</xref>
], and (iii) aggregation of web services into automatically generated and ranked workflows [
<xref ref-type="bibr" rid="B15">15</xref>
]. We are now investigating methods for applying these techniques on a larger scale, particularly to be able to extract knowledge from large volumes of chemical and biological data that would not be found by searching single sources, and to be able to use multiple independent sources to corroborate or contradict hypotheses. To do this, we are employing two key technologies:
<italic>aggregate web services </italic>
which call multiple "atomic" web services and aggregate the results, and Semantic Web languages for the representation of integrated data.</p>
<p>In this paper we describe one of the first products of this work, a tool called
<italic>WENDI </italic>
(Web Engine for Non-obvious Drug Information) that is designed to tackle a specific question: given a chemical compound of interest, how can we probe the potential biological properties of the compound using predictive models, databases, and the scholarly literature? In particular, how can we find
<italic>non-obvious </italic>
relationships between the compound and assays, genes, and diseases, that cross over different types of data source? We present WENDI as a tool for aggregating information related to a compound to allow these kinds of relationships to be identified.</p>
<p>Of course, the power of this kind of integration comes from identifying truly non-obvious but yet real relationships between these entities. Our aim in this work is to allow a rapid differentation between known relationships (i.e. those which a scientist with a reasonable understanding of the literature in a field could be expected to already know), and unknown relationships (those which could not be found in literature closely associated with a field, or not part of the 'art' of the field). There is clearly some fuzziness in this, and this makes evaluation of a tool like WENDI for non-obviousness difficult. However, we do present it as a useful tool based on qualitative feedback from existing users, and we are currently devising ways of a more quantitative evaluation (as described in the concluding section).</p>
</sec>
<sec>
<title>Implementation</title>
<sec>
<title>1. Overall architecture</title>
<p>We have since extended the
<italic>ChemBioGrid </italic>
infrastructure to be the primary data source for WENDI. Additionally, for WENDI we have introduced the idea of
<italic>aggregate web services </italic>
that call multiple individual, or
<italic>atomic</italic>
, web services and aggregate the results from these services in XML. For example, the main web service used by WENDI takes as input a SMILES string representing a compound of interest, and outputs an XML file of information about the compound aggregated by calling multiple web services. This XML file can then be parsed by an intelligent client to extract information pertinent to compound properties. The overall architecture uses a four layer approach which we described previously [
<xref ref-type="bibr" rid="B14">14</xref>
] that includes storage, interface, aggregation and smart interface layers (see Figure
<xref ref-type="fig" rid="F1">1</xref>
). The storage and interface layers are implemented using the Web Service Infrastructure, and our initial work developing aggregate web services and smart clients comprises the work described here.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>Overall architecture of storage, interface, aggregation and interaction layers employed in WENDI</bold>
. Each layer can be accessed directly, or by higher layers.</p>
</caption>
<graphic xlink:href="1758-2946-2-6-1"></graphic>
</fig>
<p>Web services either follow the Simple Object Access Protocol (SOAP) standard [
<xref ref-type="bibr" rid="B16">16</xref>
] or REpresentational State Transfer (RESTful) approach [
<xref ref-type="bibr" rid="B17">17</xref>
], the latter of which are often better integrated with Hypertext Transfer Protocol (HTTP) than SOAP-based services. Whilst we have both kinds of web service in operation, we primarily use REST service. For example, we have created a 3D similarity searching Web Service is based on our local PubChem 3D database which stores 3D structures [
<xref ref-type="bibr" rid="B18">18</xref>
] and 12 distance moments [
<xref ref-type="bibr" rid="B19">19</xref>
] for all the compounds in the PubChem database. This service is called by the WENDI web service.</p>
<p>Our SOAP-based services are deployed in a in Tomcat 5.5 application container, which allows us to maintain these services easily and provides a high level of integration with our development environments, and with the service developed by Java 1.6.0. Our Web service layer is handled by the AXIS libraries 1.6 [
<xref ref-type="bibr" rid="B20">20</xref>
], which accept a SOAP message, decode it to extract the relevant function arguments, call the appropriate Web service classes, and finally encode the return value into a SOAP document for return to the client. Our Web service is published as WSDL [
<xref ref-type="bibr" rid="B21">21</xref>
] which is an XML-based standard for describing Web services and their parameters. Increasingly, we are converting our services to REST for even easier maintenance and access. A list of some of our atomic web services can be found on the web [
<xref ref-type="bibr" rid="B22">22</xref>
]</p>
</sec>
<sec>
<title>2. Database Services</title>
<p>Our infrastructure contains a large number of compound-related databases, including mirrors of existing databases (such as PubChem), databases derived from these (such as 3D structures of PubChem compounds), and completely new databases (particularly those derived from the literature). Our databases are housed on a Linux server running the PostgreSQL database system, with gNova CHORD [
<xref ref-type="bibr" rid="B23">23</xref>
] installed to allow chemical structure searching and 2D similarity searching through the generation of fingerprints. Mirrored databases are updated monthly. By housing the databases in a homogenous environment, it is easy to perform searches that cross multiple databases using single SQL queries, and to routinely expose the databases with web service interfaces. The following databases are used in the WENDI system:</p>
<sec>
<title>PubChem Compound</title>
<p>A mirror of the PubChem Compound database, containing compound ID's (CIDs), InChI, SMILES, compound properties, and 166-key MACCS-style fingerprints [
<xref ref-type="bibr" rid="B24">24</xref>
] generated by the gNova CHORD system.</p>
</sec>
<sec>
<title>PubChem Bioassay</title>
<p>A mirror of the PubChem Bioassay database containing AIDs (assay ID's), CIDs of compounds tested, and bioassay outcomes and scores</p>
</sec>
<sec>
<title>PubChem BioDesc</title>
<p>Descriptions of all PubChem bioassays</p>
</sec>
<sec>
<title>Pub3D</title>
<p>A similarity-searchable database of minimized 3D structures for PubChem compounds</p>
</sec>
<sec>
<title>Drugbank</title>
<p>A mirror of the DrugBank dataset [
<xref ref-type="bibr" rid="B25">25</xref>
] containing CID's (mapping to PubChem), DBID's (Drugbank ID's), drug names, SMILES, usage descriptions, and 166-key fingerprints. The database contains nearly 4800 drug entries including >1,350 FDA-approved small molecule drugs, 123 FDA-approved biotech (protein/peptide) drugs, 71 nutraceuticals and >3,243 experimental drugs.</p>
</sec>
<sec>
<title>MRTD</title>
<p>An implementation of the Maximum Recommended Therapeutic Dose (MRTD) set [
<xref ref-type="bibr" rid="B26">26</xref>
] including name, SMILES, and 166-key fingerprints. The database contains 1,220 current prescription drugs available in SMILES format from the FDA Web site.</p>
</sec>
<sec>
<title>Medline Chemically-aware Publications Database</title>
<p>PubMed IDs of papers indexed in Medline[
<xref ref-type="bibr" rid="B27">27</xref>
], with SMILES of chemical structures (from the title and abstract) extracted using the Oscar3 program [
<xref ref-type="bibr" rid="B28">28</xref>
]</p>
</sec>
<sec>
<title>Phenopred</title>
<p>a matrix of predictions of gene-disease relationships based on known relationships mined from the literature and machine learning predictions [
<xref ref-type="bibr" rid="B29">29</xref>
].</p>
</sec>
<sec>
<title>Comparative Toxicogenomics Database (CTD)</title>
<p>cross-species chemical-gene/target interactions and chemical-disease relationships derived from experimental sets and the literature [
<xref ref-type="bibr" rid="B30">30</xref>
].</p>
</sec>
<sec>
<title>HuGEpedia</title>
<p>an encyclopedia of human genetic variation in health and disease [
<xref ref-type="bibr" rid="B31">31</xref>
].</p>
</sec>
<sec>
<title>ChEMBL</title>
<p>a database of bioactive drug-like small molecules, containing 2-D structures, calculated properties (e.g. logP, Molecular Weight, Lipinski Parameters, etc.) and abstracted bioactivities (e.g. binding constants, pharmacology and ADMET data) [
<xref ref-type="bibr" rid="B32">32</xref>
].</p>
<p>2D Tanimoto similarity searching of these datasets is made available by the gNova CHORD
<italic>tanimoto </italic>
function applied to the 2D
<italic>public 166 keys</italic>
, an implementation of the popular MACCS keys. Without indexing, it runs very effectively for a single query or on a small dataset, but the speed reduces significantly for large datasets. We have 56,911,891 compounds in our PubChem Compound table as the time of writing. To speed up the searching, we implemented a method described by Swamidass & Baldi to reduce the subset of molecules that need to be searched in similarity calculations [
<xref ref-type="bibr" rid="B33">33</xref>
]. The method uses simple bounds on similarity that can be applied when a similarity threshold is used (given two fingerprints A and B, and a threshold t, we can calculate a maximum similarity between the fingerprints as min (a,b)/(a+b-min (a/b)), where a and b are the number of bits set in A and B respectively).</p>
<p>In addition to 2D similarity searching, 3D similarity searching is provided on Pub3D database using 12-dimensional molecular shape descriptors [
<xref ref-type="bibr" rid="B20">20</xref>
] calculated for our Pub3D database of 3D minimized structures of PubChem compounds. Similarity to a query is calculated using Euclidean Distance. We use PostgreSQL to store all these 12D vectors for all compounds, with the CUBE type [
<xref ref-type="bibr" rid="B34">34</xref>
] extension.</p>
</sec>
</sec>
<sec>
<title>3. Prediction services</title>
<p>We have made available a variety of predictions through our web service framework, particularly:</p>
<p>
<bold>Tumor cell line predictions</bold>
. We created 40 Random Forest models for prediction of human tumor cell line inhibition, trained using data from the NCI Developmental Therapeutics Program Human Tumor Cell Lines [
<xref ref-type="bibr" rid="B13">13</xref>
]. These predictions output a probability of activity for a compound (0-1).</p>
<p>
<bold>Toxicity prediction</bold>
. We implemented a special modified Web service implementation of ToxTree [
<xref ref-type="bibr" rid="B35">35</xref>
] for prediction of toxic effects</p>
<p>
<bold>Gene-disease relationships</bold>
. We have implemented a table of predictions of gene-disease relationships extracted from the PhenoPred tool developed at Indiana University [
<xref ref-type="bibr" rid="B29">29</xref>
]. Also we employed the CTD and HuGEpedia data to expore gene-disease relationships,</p>
</sec>
<sec>
<title>4. Aggregate web service and client</title>
<p>We have created a main WENDI aggregate web service, and a web-based client that employs the web service. The web service takes a query SMILES string as input (through a SOAP or REST interface), and calls a variety of web services and database searches using the query. Results are returned as an aggregate XML file with sections delineated according to the atomic web service that was called. Additional XML tags are added by the web service, in particular, Gene Ontology terms in the PubChem Bioassay descriptions, Drug descriptions (from Drugbank) and paper titles and abstracts, are extracted and tagged with Gene Ontology ID's (GOID's). These permit associations to be made between genes and assays, drugs and papers.</p>
<p>The client permits the user to input a SMILES string, or to draw a structure in using the JME editor [
<xref ref-type="bibr" rid="B36">36</xref>
], and then uses JSP (Java Server Pages) to submit the query request to the web service and display and parse the XML results, and JavaScript to handle the XML file as the response return back from the server side. The layer between request submitted by the client and response returned back from the server is effected using AJAX (shorthand for Asynchronous JavaScript + XML) technology. With Ajax, web applications can retrieve data from the server asynchronously without interfering with the display and behavior of an existing page.</p>
<p>The primary way that the databases are employed in WENDI is through similarity searching: finding compounds in the databases that are similar to the query, which have some known property: for example, we retrieve compounds that are similar (>0.85 Tanimoto) to a query molecule that are active in a given bioassay, are known drugs, or are referenced in a journal article. Based on the similar property principle [
<xref ref-type="bibr" rid="B37">37</xref>
] we can assume that these molecules are likely to have similar properties to the query compound, thus be of interest in understanding the potential properties of the query.</p>
<p>The WENDI interface is organized into six major sections:</p>
<p>
<italic>Predictive models results </italic>
presents the predicted probability of activity of the compound in 40 Human Tumor Cell line assays, organized by panel type (renal, non-small cell lung, breast, colon, etc) and color coded according to probability of activity (red for > = 0.7, yellow for > = 0.6 and <0.7, and grey for <0.6). Confusion metrics are also presented to allow the validity of these models to be assessed. Also presented are the results of a ToxTree analysis, particularly the classification according to Cramer rules [
<xref ref-type="bibr" rid="B38">38</xref>
] and a breakdown of presence or absence of known toxic fragments.</p>
<p>
<italic>Activities of similar compounds </italic>
presents a list of similar compounds (Tanimoto similarity values given) in PubChem that have been tested in bioassays, and shown to be either active. A link to the bioassay along with the bioassay name is given, and an additional column uses the extraction of Gene Ontology terms from the bioassay description along with the PhenoPred predictions of gene-disease relationships to list possible related diseases. The DrugBank and MRTD sets are also similarity searched with the results presented in a similar fashion; in the case of DrugBank, drug usage descriptions are given along with predictions of diseases extracted in a similar way to the PubChem section</p>
<p>
<italic>Similar compounds from chemogenomics data </italic>
presents a list of similar compounds (Tanimoto similarity values given) from CTD, ChEMBL data that include the relationships with compounds and genes/diseases.</p>
<p>
<italic>Similar compounds from Systems data </italic>
presents a list of similar compounds (Tanimoto similarity values given) from KEGG data that include the relationships with compounds and Pathways/Enzymes.</p>
<p>
<italic>Similar compounds in the literature </italic>
lists journal articles in Medline where the title or abstract contains compounds with a Tanimoto similarity >0.85 to the query. Links are given to the Journal articles</p>
<p>
<italic>Inactivities of similar compounds </italic>
presents the same informations as
<italic>Activities of similar compounds </italic>
sections, except for all of the similar PubChem compounds found that have been tested in bioassays and shown to be inactive.</p>
<p>Finally, a link is given to the raw XML file, and PDF file for download.</p>
</sec>
</sec>
<sec>
<title>Results</title>
<p>On submission of a query, WENDI generally returns results within a minute. We have tested WENDI with a variety of query compounds with known biological activities, one of them is described below. It can be simply tested by the reader by visiting the WENDI site. As an example, a screenshot of the first results returned for Doxorubicin are shown in Figure
<xref ref-type="fig" rid="F2">2</xref>
.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>Screenshot of the results returned from WENDI for Doxorubicin</bold>
.</p>
</caption>
<graphic xlink:href="1758-2946-2-6-2"></graphic>
</fig>
<p>Doxorubicin is an anthracyclin antibiotic that is used primarily as a nonspecific tumor inhibitor (including cancers of the bladder, breast, stomach, lung, ovaries, thyroid, along with soft tissue sarcoma and multiple myeloma). The mechanism of action is not fully understood, although it is thought to be a DNA intercalator.</p>
<p>WENDI identifies several corroborating pieces of evidence for the biological actions of Doxorubicin. In particular, it (i) predicts that the compound has a high probability of activity in all but one of the tumor cell line screens (red) and a medium probability in HCT-15 (colon cancer); (ii) predicts that the compounds has toxic effects by our Toxicity prediction service, corroborated by descriptions from DrugBank; (iii) identifies specific tumor-related bioassays in which compounds similar to Doxorubicin (and identical to it) were found to be active (in particular, many similar compounds were found to be active in NCI Tumor Cell Line screens, corroborating the predictions of activity); (iv) identifies a wide variety of assays in which compounds similar to Doxorubicin are inactive; (v) identifies several similar drugs to Doxorubicin (Epirubcin, Daunorubicin, Idarubicin) along with descriptions corroborating the nonspecific anti-tumor activity; (vi) identifies numerous publications linking Doxorubicin and related compounds to a variety of tumor activities (Figure
<xref ref-type="fig" rid="F3">3</xref>
)</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>
<bold>Screenshot of the insights from the literature returned from WENDI for Doxorubicin</bold>
.</p>
</caption>
<graphic xlink:href="1758-2946-2-6-3"></graphic>
</fig>
<p>A chemical compounds recently submitted to PubChem, but not collected in our database yet, were also used as queries for WENDI. The results and some interpretations are given in the Table
<xref ref-type="table" rid="T1">1</xref>
and more results of other compounds tested by WENDI are shown in Table
<xref ref-type="table" rid="T2">2</xref>
.</p>
<table-wrap id="T1" position="float">
<label>Table 1</label>
<caption>
<p>Query compounds and related biological activities retrieved from WENDI</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center">Query CID</th>
<th align="center">44246308</th>
<th align="center">44246315</th>
<th align="center">44247545</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">Reported activities</td>
<td align="center">weak activity against Sortase-A (SrtA), an antimicrobial target</td>
<td align="center">tested and shown negative for activity against DNA polymerase alpha and beta</td>
<td align="center">None</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Tumor Cell Line Predictive models</td>
<td align="center">50-60% probability of activity in breast, renal, prostate, HS, ovarian, leukemia, melanoma, non-small cell lung; otherwise <50% probability</td>
<td align="center">50-60% probability of activity in renal, leukemia, non-small cell lung, colon, melanoma; otherwise <50% probability</td>
<td align="center"><50% probability for all tumor cell lines</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Bioassay activities & gene relationships of similar compounds</td>
<td align="center">highly similar molecules found to be antagonists of GPCR GPR7 (associated with feeding behavior, obesity and inflammatory pain); CYP2C9 (metabolizes NSAIDS and sulfonylureas); inhibition of Non small-cell lung cancer (NCI HOP-18) and supression of colon tumors; inhibition of HIV-1 RNase H</td>
<td align="center">similar molecules are shown active in CYP3A4 confirmation assay (important in drug metabolism); CYP2C9 (metabolizes NSAIDS and sulfonlyureas); BAP1 inhibition (tumor suppressor involved in breast cancer BRCA1); probes of Alpha-Synuclein 5'UTR (related to Parkinsons disease); FPR (GPCR involved in chemotaxis); antibacterial activity (Mycobacterium tuberculosis and VIM-2 metallo-beta-lactamase)</td>
<td align="center">similar compounds show activity in CYP2C19 (metabolism of antiepeleptics and protein-pump inhibitors); agonist of M1 muscarinic receptor (associated with Alzheimer's and antipsychotics); Estrogen receptor alpha coactivator binding inhibitors (breast cancer association);</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Bioassay inactives of similar compounds</td>
<td align="center">many highly similar compounds (including one with a nominal 1.0 similarity) show inactive in RNase H screen (AID-372)</td>
<td align="center">similar molecules inactive for HIV inhibition; inhibition of breast tumors (BRCT:pBACH1 of BRCA1); hERG inhibition; HIV-1 RNase H inhibition; 14-3-3 protein interaction modulators; antibacterial (Mycobacterium tuberculosis); FKBP12 immunosupressant;</td>
<td align="center">similar compounds inactive for Cdc25B catalytic domain protein tyrosine phosphatase; beta-glucocerebrosidase inhibitors (linked with Gaucher disease); 14-3-3- protein interaction modulation; hERG blockers of proarrythmic agents</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">CTD gene relationships of similar compounds</td>
<td align="center">similar compounds show link with use of anti-inflammatory drugs (NSAIDS) in carcinomas; CYP2C9;</td>
<td align="center">similar compounds linked with Gilbert disease; adenoma; use of anti-inflammatory drugs (NSAIDS) in carinomas; coronary arterial protection; colorectal neoplasms (tumors)</td>
<td align="center">None</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Activities of similar marketed drugs</td>
<td align="center">None</td>
<td align="center">None</td>
<td align="center">None</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Insights from similar compounds in journal articles (MEDLINE)</td>
<td align="center">None</td>
<td align="center">Intricatin, a similar isofavonoid, is shown to be antimutagenic; Claussequinone has anti-inflammatory activity</td>
<td align="center">None</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Interpretation</td>
<td align="center">Some evidence for anti-inflammatory activity (particularly related to tumors) and CYP2C9 inhibition; mixed evidence on generalized anti-tumor activity and inhibition of HIV-1 RNase H44</td>
<td align="center">Generalized, nonspecific activity, although may be worth investigating for anti-tumor activity particularly colon cancer.</td>
<td align="center">None</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T2" position="float">
<label>Table 2</label>
<caption>
<p>More Query compounds and related biological activities retrieved from WENDI</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center">Query CID</th>
<th align="center">44246407</th>
<th align="center">44246344</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">Reported activities</td>
<td align="center">Inhibitor/activator of human alpha glucosidase</td>
<td align="center">None</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Tumor Cell Line Predictive models</td>
<td align="center">50-60% probability of activity in melanoma, leukemia, otherwise <50% probability</td>
<td align="center">Strong prediction (>70% probability) of activity in prostate, colon, non-small cell lung, breast, malanoma, leukemia, ovarian cancers. 50-60% probability in all other cell lines.</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Bioassay activities & gene relationships of similar compounds</td>
<td align="center">similar compound shows active as an inhibitor of MEK-5 Kinase 2 mutant</td>
<td align="center">Similar compounds show active in NCI ovarian cancer cell line (IGROV1), breast cancer cell line (MB-435); non small cell lung cancer (H23); MLPCN Alpha-synuclein 5'UTR binding activation (Parkinson's disease); Leishmania promastigote inhibition; NCI yeast anticancer screen; RAM inhibition (STAT3);</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Bioassay inactives of similar compounds</td>
<td align="center">similar compounds show inactive in SIP3 antagonists assay, hERG blockers of proarrythmic agents. and 14-3-3- protein interaction modulation</td>
<td align="center">similar compounds inactive in RNase H inhibition, NCI non small cell lung cancer (H23) and Leukemia (L1210); NCI yeast anticancer screen; 14-3-3 protein interaction modulators; SIP3 antagonists</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">CTD gene relationships of similar compounds</td>
<td align="center">A similar compound associated with adenomatous polyposis</td>
<td align="center">Similar compounds associated with Alzheimer's disease</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Activities of similar marketed drugs</td>
<td align="center">None</td>
<td align="center">None</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Insights from similar compounds in journal articles (MEDLINE)</td>
<td align="center">None</td>
<td align="center">None</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Interpretation</td>
<td align="center">None</td>
<td align="center">None</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Conclusion</title>
<p>In this paper, we present a integrative data mining tool for drug discovery using aggregate web services. WENDI aims to build a full picture of potential biological activities of a chemical compound through the aggregation of data from web services that represent diverse multiple sources (including predictive models, databases and journal articles). WENDI allows the identification of corroborating or conflicting information: for instance, a compound might be predicted active in a breast cancer cell line, and similar compounds might show active in a PubChem BioAssay related to breast cancer, or be co-located in a paper abstract with a breast cancer related gene. We are now deveoping a next generation of tools based on WENDI and our recent Chem2Bio2RDF system [
<xref ref-type="bibr" rid="B39">39</xref>
] for exploring inferred relationships between compounds and diseases, genes, pathways using Semantic Web technologies including ontologies and RDF. We are also devising ways of quantitatively evaluating the extent to which WENDI truly identifies 'non-obvious' kinds of relationship, including using a corpus of literature in the field as the baseline for the 'obvious' relationships, as well as courting specific case studies from users for qualitative analysis.</p>
</sec>
<sec>
<title>Availability and requirements</title>
<p>Project name: WENDI (Web Engine for Non-obvious Drug Information)</p>
<p>• Project home page: https:
<ext-link ext-link-type="uri" xlink:href="https://cheminfov.informatics.indiana.edu:8443/WENDI_PUBLIC/WENDI.jsp">https://cheminfov.informatics.indiana.edu:8443/WENDI_PUBLIC/WENDI.jsp</ext-link>
</p>
<p>• Operating system(s): Platform independent</p>
<p>• Programming language: Java</p>
<p>• Other requirements: Java browser-embedded plugin</p>
<p>• License: None. Any restrictions to use by non-academics: None</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<title>Authors' contributions</title>
<p>QZ carried out the whole implementation of WENDI, supervised by MSL and DJW. DJW made the examples in the result section, MSL and DJW modified this paper from the draft written by QZ. All authors have read and approved the final version of the manuscript.</p>
</sec>
</body>
<back>
<sec>
<title>Acknowledgements</title>
<p>This work was supported by Eli Lilly & Company. We would like to thank Dr. Rajarshi Guha for assistance at the initial stage of this work.</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="journal">
<name>
<surname>Mullin</surname>
<given-names>R</given-names>
</name>
<article-title>Dealing with Data Overload</article-title>
<source>Chemical & Engineering News</source>
<year>2004</year>
<volume>82</volume>
<issue>12</issue>
<fpage>19</fpage>
<lpage>24</lpage>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="other">
<article-title>PubChem</article-title>
<ext-link ext-link-type="uri" xlink:href="http://pubchem.ncbi.nlm.nih.gov/search/search.cgi">http://pubchem.ncbi.nlm.nih.gov/search/search.cgi</ext-link>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="other">
<article-title>Chemspider</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.chemspider.com/">http://www.chemspider.com/</ext-link>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="other">
<article-title>PDB</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.rcsb.org/pdb/home/home.do">http://www.rcsb.org/pdb/home/home.do</ext-link>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="other">
<article-title>KEGG</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.genome.jp/kegg/">http://www.genome.jp/kegg/</ext-link>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Belleau</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Nolin</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Tourigny</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Rigault</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Morissette</surname>
<given-names>J</given-names>
</name>
<article-title>Bio2RDF: towards a mashup to build bioinformatics knowledge systems</article-title>
<source>J Biomed Inform</source>
<year>2008</year>
<volume>41</volume>
<issue>5</issue>
<fpage>706</fpage>
<lpage>716</lpage>
<pub-id pub-id-type="doi">10.1016/j.jbi.2008.03.004</pub-id>
<pub-id pub-id-type="pmid">18472304</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="other">
<article-title>Linking Open Drug Data</article-title>
<ext-link ext-link-type="uri" xlink:href="http://esw.w3.org/topic/HCLSIG/LODD">http://esw.w3.org/topic/HCLSIG/LODD</ext-link>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<name>
<surname>Wild</surname>
<given-names>DJ</given-names>
</name>
<article-title>Grand Challenges for Cheminformatics</article-title>
<source>J Cheminf</source>
<year>2009</year>
<volume>1</volume>
<fpage>1</fpage>
<pub-id pub-id-type="doi">10.1186/1758-2946-1-1</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="other">
<article-title>Cyberinfrastructure</article-title>
<ext-link ext-link-type="uri" xlink:href="http://en.wikipedia.org/wiki/Cyberinfrastructure">http://en.wikipedia.org/wiki/Cyberinfrastructure</ext-link>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="journal">
<name>
<surname>Dong</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>KE</given-names>
</name>
<name>
<surname>Guha</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Heiland</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Pierce</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Fox</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>Wild</surname>
<given-names>DJ</given-names>
</name>
<article-title>Web service infrastructure for chemoinformatics</article-title>
<source>J Chem Info Model</source>
<year>2007</year>
<volume>47</volume>
<issue>4</issue>
<fpage>1303</fpage>
<lpage>1307</lpage>
<pub-id pub-id-type="doi">10.1021/ci6004349</pub-id>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<name>
<surname>Hur</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wild</surname>
<given-names>DJ</given-names>
</name>
<article-title>PubChemSR: A search and retrieval tool for PubChem</article-title>
<source>Chemistry Central Journal</source>
<year>2008</year>
<volume>2</volume>
<fpage>11</fpage>
<pub-id pub-id-type="doi">10.1186/1752-153X-2-11</pub-id>
<pub-id pub-id-type="pmid">18482452</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<name>
<surname>Willighagen</surname>
<given-names>E</given-names>
</name>
<name>
<surname>O'Boyle</surname>
<given-names>NM</given-names>
</name>
<name>
<surname>Gopalakrishnan</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Jiao</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Guha</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Steinbeck</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wild</surname>
<given-names>DJ</given-names>
</name>
<article-title>Userscripts for the Life Sciences</article-title>
<source>BMC Bioinformatics</source>
<year>2007</year>
<volume>8</volume>
<fpage>487</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-487</pub-id>
<pub-id pub-id-type="pmid">18154664</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Klinginsmith</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Guha</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Crippen</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Wild</surname>
<given-names>DJ</given-names>
</name>
<article-title>Chemical Data Mining of the NCI Human Tumor Cell Line Database</article-title>
<source>J Chem Info Model</source>
<year>2007</year>
<volume>47</volume>
<issue>6</issue>
<fpage>2063</fpage>
<lpage>2076</lpage>
<pub-id pub-id-type="doi">10.1021/ci700141x</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="book">
<name>
<surname>Wild</surname>
<given-names>DJ</given-names>
</name>
<person-group person-group-type="editor">Ekins S</person-group>
<article-title>Strategies for Using Information Effectively in Early-stage Drug Discovery</article-title>
<source>Computer Applications in Pharmaceutical Research and Development</source>
<year>2006</year>
<publisher-name>Wiley-Interscience, Hoboken</publisher-name>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="other">
<name>
<surname>Dong</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Wild</surname>
<given-names>DJ</given-names>
</name>
<article-title>An Automatic Drug Discovery Workflow Generation Tool using Sematic Web Technologies</article-title>
<source>Proceedings of the 4th IEEE conference on eScience</source>
<year>2008</year>
<fpage>652</fpage>
<lpage>657</lpage>
<comment>full_text</comment>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="other">
<article-title>Simple Object Access Protocol (SOAP)</article-title>
<ext-link ext-link-type="uri" xlink:href="http://en.wikipedia.org/wiki/SOAP">http://en.wikipedia.org/wiki/SOAP</ext-link>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="other">
<article-title>Representational State Transfer (REST)</article-title>
<ext-link ext-link-type="uri" xlink:href="http://en.wikipedia.org/wiki/Representational_State_Transfer">http://en.wikipedia.org/wiki/Representational_State_Transfer</ext-link>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="other">
<article-title>smi23d: Generation of a 3D structure from a SMILES string, using the smi23d program</article-title>
<ext-link ext-link-type="uri" xlink:href="http://chembiogrid.org/projects/proj_ws_all.html">http://chembiogrid.org/projects/proj_ws_all.html</ext-link>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<name>
<surname>Ballester</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Richards</surname>
<given-names>WG</given-names>
</name>
<article-title>Ultrafast Shape Recognition to Search Compound Databases for Similar Molecular Shapes</article-title>
<source>J Comp Chem</source>
<year>2007</year>
<volume>28</volume>
<fpage>1711</fpage>
<lpage>1723</lpage>
<pub-id pub-id-type="doi">10.1002/jcc.20681</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="other">
<article-title>AXIS libraries</article-title>
<ext-link ext-link-type="uri" xlink:href="http://ws.apache.org/axis">http://ws.apache.org/axis</ext-link>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="other">
<article-title>WebSerVices Description Language (WSDL)</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.w3.org/TR/wsdl">http://www.w3.org/TR/wsdl</ext-link>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="other">
<article-title>chembiogrid web services list</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.chembiogrid.org/projects/proj_ws_all.html">http://www.chembiogrid.org/projects/proj_ws_all.html</ext-link>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="other">
<article-title>gNova Scientific Software</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.gnova.com">http://www.gnova.com</ext-link>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal">
<name>
<surname>Durant</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Leland</surname>
<given-names>BA</given-names>
</name>
<name>
<surname>Henry</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Nourse</surname>
<given-names>JG</given-names>
</name>
<article-title>Reoptimization of MDL Keys for Use in Drug Discovery</article-title>
<source>J Chem Inf Comput Sci</source>
<year>2002</year>
<volume>42</volume>
<issue>6</issue>
<fpage>1273</fpage>
<lpage>1280</lpage>
<pub-id pub-id-type="pmid">12444722</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="other">
<article-title>Drugbank</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.drugbank.ca">http://www.drugbank.ca</ext-link>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="other">
<article-title>MRTD</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.fda.gov/aboutfda/centersoffices/cder/ucm092199.htm">http://www.fda.gov/aboutfda/centersoffices/cder/ucm092199.htm</ext-link>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="other">
<article-title>Medline</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.nlm.nih.gov/bsd/licensee/2009_stats/baseline_med_filecount.html">http://www.nlm.nih.gov/bsd/licensee/2009_stats/baseline_med_filecount.html</ext-link>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="other">
<article-title>OSCAR3 (Open Source Chemistry Analysis Routines)</article-title>
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/oscar3-chem/">http://sourceforge.net/projects/oscar3-chem/</ext-link>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="other">
<article-title>PhenoPred</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.phenopred.ort">http://www.phenopred.ort</ext-link>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="other">
<article-title>Comparative Toxicogen omics Database (CTD)</article-title>
<ext-link ext-link-type="uri" xlink:href="http://ctd.mdibl.org">http://ctd.mdibl.org</ext-link>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="other">
<article-title>HuGEpedia</article-title>
<ext-link ext-link-type="uri" xlink:href="http://hugenavigator.net/">http://hugenavigator.net/</ext-link>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="other">
<article-title>ChEMBL</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/chembldb/">http://www.ebi.ac.uk/chembldb/</ext-link>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<name>
<surname>Swamidass</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Baldi</surname>
<given-names>P</given-names>
</name>
<article-title>Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sublinear Time</article-title>
<source>J Chem Inf Model</source>
<year>2007</year>
<volume>47</volume>
<fpage>302</fpage>
<lpage>317</lpage>
<pub-id pub-id-type="doi">10.1021/ci600358f</pub-id>
<pub-id pub-id-type="pmid">17326616</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="other">
<article-title>PostgreSQL CUBE data type</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.postgresql.org/docs/8.3/static/cube.html">http://www.postgresql.org/docs/8.3/static/cube.html</ext-link>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="other">
<article-title>Toxic Hazard Estimation</article-title>
<ext-link ext-link-type="uri" xlink:href="http://ecb.jrc.ec.europa.eu/qsar/qsar-tools/index.php?c=TOXTREE">http://ecb.jrc.ec.europa.eu/qsar/qsar-tools/index.php?c=TOXTREE</ext-link>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="other">
<article-title>JME Molecular Editor</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.molinspiration.com/jme/">http://www.molinspiration.com/jme/</ext-link>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="book">
<name>
<surname>Johnson</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Maggiora</surname>
<given-names>GM</given-names>
</name>
<source>Concepts and Applications of Molecular Similarity</source>
<year>1990</year>
<publisher-name>John Wiley & Sons: New York</publisher-name>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<name>
<surname>Cramer</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Ford</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Hall</surname>
<given-names>RL</given-names>
</name>
<article-title>Estimation of Toxic Hazard - A Decision Tree Approach</article-title>
<source>J Cosmet Toxicol</source>
<year>1978</year>
<volume>16</volume>
<fpage>255</fpage>
<lpage>276</lpage>
<pub-id pub-id-type="doi">10.1016/S0015-6264(76)80522-6</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal">
<name>
<surname>Chen</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Jiao</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Wild</surname>
<given-names>DJ</given-names>
</name>
<article-title>Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data</article-title>
<source>BMC Bioinformatics</source>
<year>2010</year>
<volume>11</volume>
<fpage>255</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-11-255</pub-id>
<pub-id pub-id-type="pmid">20478034</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000281 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000281 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2933596
   |texte=   WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:20727184" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024