Serveur d'exploration sur la télématique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

RecRWR: A Recursive Random Walk Method for Improved Identification of Diseases

Identifieur interne : 000437 ( Pmc/Corpus ); précédent : 000436; suivant : 000438

RecRWR: A Recursive Random Walk Method for Improved Identification of Diseases

Auteurs : Joel Perdiz Arrais ; José Luís Oliveira

Source :

RBID : PMC:4385608

Abstract

High-throughput methods such as next-generation sequencing or DNA microarrays lack precision, as they return hundreds of genes for a single disease profile. Several computational methods applied to physical interaction of protein networks have been successfully used in identification of the best disease candidates for each expression profile. An open problem for these methods is the ability to combine and take advantage of the wealth of biomedical data publicly available. We propose an enhanced method to improve selection of the best disease targets for a multilayer biomedical network that integrates PPI data annotated with stable knowledge from OMIM diseases and GO biological processes. We present a comprehensive validation that demonstrates the advantage of the proposed approach, Recursive Random Walk with Restarts (RecRWR). The obtained results outline the superiority of the proposed approach, RecRWR, in identifying disease candidates, especially with high levels of biological noise and benefiting from all data available.


Url:
DOI: 10.1155/2015/747156
PubMed: 25874227
PubMed Central: 4385608

Links to Exploration step

PMC:4385608

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">RecRWR: A Recursive Random Walk Method for Improved Identification of Diseases</title>
<author>
<name sortKey="Perdiz Arrais, Joel" sort="Perdiz Arrais, Joel" uniqKey="Perdiz Arrais J" first="Joel" last="Perdiz Arrais">Joel Perdiz Arrais</name>
<affiliation>
<nlm:aff id="I1">Department of Informatics Engineering (DEI), Centre for Informatics and Systems of the University of Coimbra (CISUC), University of Coimbra, 3030-290 Coimbra, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliveira, Jose Luis" sort="Oliveira, Jose Luis" uniqKey="Oliveira J" first="José Luís" last="Oliveira">José Luís Oliveira</name>
<affiliation>
<nlm:aff id="I2">Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">25874227</idno>
<idno type="pmc">4385608</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4385608</idno>
<idno type="RBID">PMC:4385608</idno>
<idno type="doi">10.1155/2015/747156</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000437</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000437</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">RecRWR: A Recursive Random Walk Method for Improved Identification of Diseases</title>
<author>
<name sortKey="Perdiz Arrais, Joel" sort="Perdiz Arrais, Joel" uniqKey="Perdiz Arrais J" first="Joel" last="Perdiz Arrais">Joel Perdiz Arrais</name>
<affiliation>
<nlm:aff id="I1">Department of Informatics Engineering (DEI), Centre for Informatics and Systems of the University of Coimbra (CISUC), University of Coimbra, 3030-290 Coimbra, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliveira, Jose Luis" sort="Oliveira, Jose Luis" uniqKey="Oliveira J" first="José Luís" last="Oliveira">José Luís Oliveira</name>
<affiliation>
<nlm:aff id="I2">Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BioMed Research International</title>
<idno type="ISSN">2314-6133</idno>
<idno type="eISSN">2314-6141</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>High-throughput methods such as next-generation sequencing or DNA microarrays lack precision, as they return hundreds of genes for a single disease profile. Several computational methods applied to physical interaction of protein networks have been successfully used in identification of the best disease candidates for each expression profile. An open problem for these methods is the ability to combine and take advantage of the wealth of biomedical data publicly available. We propose an enhanced method to improve selection of the best disease targets for a multilayer biomedical network that integrates PPI data annotated with stable knowledge from OMIM diseases and GO biological processes. We present a comprehensive validation that demonstrates the advantage of the proposed approach, Recursive Random Walk with Restarts (RecRWR). The obtained results outline the superiority of the proposed approach, RecRWR, in identifying disease candidates, especially with high levels of biological noise and benefiting from all data available.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Giallourakis, C" uniqKey="Giallourakis C">C. Giallourakis</name>
</author>
<author>
<name sortKey="Henson, C" uniqKey="Henson C">C. Henson</name>
</author>
<author>
<name sortKey="Reich, M" uniqKey="Reich M">M. Reich</name>
</author>
<author>
<name sortKey="Xie, X" uniqKey="Xie X">X. Xie</name>
</author>
<author>
<name sortKey="Mootha, V K" uniqKey="Mootha V">V. K. Mootha</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brunner, H G" uniqKey="Brunner H">H. G. Brunner</name>
</author>
<author>
<name sortKey="Van Driel, M A" uniqKey="Van Driel M">M. A. van Driel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Auffray, C" uniqKey="Auffray C">C. Auffray</name>
</author>
<author>
<name sortKey="Chen, Z" uniqKey="Chen Z">Z. Chen</name>
</author>
<author>
<name sortKey="Hood, L" uniqKey="Hood L">L. Hood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, J" uniqKey="Chen J">J. Chen</name>
</author>
<author>
<name sortKey="Xu, H" uniqKey="Xu H">H. Xu</name>
</author>
<author>
<name sortKey="Aronow, B J" uniqKey="Aronow B">B. J. Aronow</name>
</author>
<author>
<name sortKey="Jegga, A G" uniqKey="Jegga A">A. G. Jegga</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aerts, S" uniqKey="Aerts S">S. Aerts</name>
</author>
<author>
<name sortKey="Lambrechts, D" uniqKey="Lambrechts D">D. Lambrechts</name>
</author>
<author>
<name sortKey="Maity, S" uniqKey="Maity S">S. Maity</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rossi, S" uniqKey="Rossi S">S. Rossi</name>
</author>
<author>
<name sortKey="Masotti, D" uniqKey="Masotti D">D. Masotti</name>
</author>
<author>
<name sortKey="Nardini, C" uniqKey="Nardini C">C. Nardini</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moreau, Y" uniqKey="Moreau Y">Y. Moreau</name>
</author>
<author>
<name sortKey="Tranchevent, L C" uniqKey="Tranchevent L">L.-C. Tranchevent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Arrais, J P" uniqKey="Arrais J">J. P. Arrais</name>
</author>
<author>
<name sortKey="Fernandes, J" uniqKey="Fernandes J">J. Fernandes</name>
</author>
<author>
<name sortKey="Pereira, J" uniqKey="Pereira J">J. Pereira</name>
</author>
<author>
<name sortKey="Oliveira, J L" uniqKey="Oliveira J">J. L. Oliveira</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barabasi, A L" uniqKey="Barabasi A">A.-L. Barabási</name>
</author>
<author>
<name sortKey="Gulbahce, N" uniqKey="Gulbahce N">N. Gulbahce</name>
</author>
<author>
<name sortKey="Loscalzo, J" uniqKey="Loscalzo J">J. Loscalzo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Joy, M P" uniqKey="Joy M">M. P. Joy</name>
</author>
<author>
<name sortKey="Brock, A" uniqKey="Brock A">A. Brock</name>
</author>
<author>
<name sortKey="Ingber, D E" uniqKey="Ingber D">D. E. Ingber</name>
</author>
<author>
<name sortKey="Huang, S" uniqKey="Huang S">S. Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ma, H W" uniqKey="Ma H">H.-W. Ma</name>
</author>
<author>
<name sortKey="Zeng, A P" uniqKey="Zeng A">A.-P. Zeng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Erten, S" uniqKey="Erten S">S. Erten</name>
</author>
<author>
<name sortKey="Bebek, G" uniqKey="Bebek G">G. Bebek</name>
</author>
<author>
<name sortKey="Ewing, R M" uniqKey="Ewing R">R. M. Ewing</name>
</author>
<author>
<name sortKey="Koyuturk, M" uniqKey="Koyuturk M">M. Koyutürk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Arrais, J P" uniqKey="Arrais J">J. P. Arrais</name>
</author>
<author>
<name sortKey="Oliveira, J L" uniqKey="Oliveira J">J. L. Oliveira</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macropol, K" uniqKey="Macropol K">K. Macropol</name>
</author>
<author>
<name sortKey="Can, T" uniqKey="Can T">T. Can</name>
</author>
<author>
<name sortKey="Singh, A K" uniqKey="Singh A">A. K. Singh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, L" uniqKey="Yu L">L. Yu</name>
</author>
<author>
<name sortKey="Gao, L" uniqKey="Gao L">L. Gao</name>
</author>
<author>
<name sortKey="Li, K" uniqKey="Li K">K. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le, D H" uniqKey="Le D">D.-H. Le</name>
</author>
<author>
<name sortKey="Kwon, Y K" uniqKey="Kwon Y">Y.-K. Kwon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Re, M" uniqKey="Re M">M. Re</name>
</author>
<author>
<name sortKey="Mesiti, M" uniqKey="Mesiti M">M. Mesiti</name>
</author>
<author>
<name sortKey="Valentini, G" uniqKey="Valentini G">G. Valentini</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kohler, S" uniqKey="Kohler S">S. Kohler</name>
</author>
<author>
<name sortKey="Bauer, S" uniqKey="Bauer S">S. Bauer</name>
</author>
<author>
<name sortKey="Horn, D" uniqKey="Horn D">D. Horn</name>
</author>
<author>
<name sortKey="Robinson, P N" uniqKey="Robinson P">P. N. Robinson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Szklarczyk, D" uniqKey="Szklarczyk D">D. Szklarczyk</name>
</author>
<author>
<name sortKey="Franceschini, A" uniqKey="Franceschini A">A. Franceschini</name>
</author>
<author>
<name sortKey="Kuhn, M" uniqKey="Kuhn M">M. Kuhn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hamosh, A" uniqKey="Hamosh A">A. Hamosh</name>
</author>
<author>
<name sortKey="Scott, A F" uniqKey="Scott A">A. F. Scott</name>
</author>
<author>
<name sortKey="Amberger, J S" uniqKey="Amberger J">J. S. Amberger</name>
</author>
<author>
<name sortKey="Bocchini, C A" uniqKey="Bocchini C">C. A. Bocchini</name>
</author>
<author>
<name sortKey="Mckusick, V A" uniqKey="Mckusick V">V. A. McKusick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M. Ashburner</name>
</author>
<author>
<name sortKey="Ball, C A" uniqKey="Ball C">C. A. Ball</name>
</author>
<author>
<name sortKey="Blake, J A" uniqKey="Blake J">J. A. Blake</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Biomed Res Int</journal-id>
<journal-id journal-id-type="iso-abbrev">Biomed Res Int</journal-id>
<journal-id journal-id-type="publisher-id">BMRI</journal-id>
<journal-title-group>
<journal-title>BioMed Research International</journal-title>
</journal-title-group>
<issn pub-type="ppub">2314-6133</issn>
<issn pub-type="epub">2314-6141</issn>
<publisher>
<publisher-name>Hindawi Publishing Corporation</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">25874227</article-id>
<article-id pub-id-type="pmc">4385608</article-id>
<article-id pub-id-type="doi">10.1155/2015/747156</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>RecRWR: A Recursive Random Walk Method for Improved Identification of Diseases</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Perdiz Arrais</surname>
<given-names>Joel</given-names>
</name>
<xref ref-type="aff" rid="I1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="cor1">
<sup>*</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Oliveira</surname>
<given-names>José Luís</given-names>
</name>
<xref ref-type="aff" rid="I2">
<sup>2</sup>
</xref>
</contrib>
</contrib-group>
<aff id="I1">
<sup>1</sup>
Department of Informatics Engineering (DEI), Centre for Informatics and Systems of the University of Coimbra (CISUC), University of Coimbra, 3030-290 Coimbra, Portugal</aff>
<aff id="I2">
<sup>2</sup>
Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal</aff>
<author-notes>
<corresp id="cor1">*Joel Perdiz Arrais:
<email>jpa@dei.uc.pt</email>
</corresp>
<fn fn-type="other">
<p>Academic Editor: Juan M. Corchado</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<year>2015</year>
</pub-date>
<pub-date pub-type="epub">
<day>22</day>
<month>3</month>
<year>2015</year>
</pub-date>
<volume>2015</volume>
<elocation-id>747156</elocation-id>
<history>
<date date-type="received">
<day>22</day>
<month>8</month>
<year>2014</year>
</date>
<date date-type="rev-recd">
<day>17</day>
<month>10</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>31</day>
<month>10</month>
<year>2014</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2015 J. Perdiz Arrais and J. L. Oliveira.</copyright-statement>
<copyright-year>2015</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>High-throughput methods such as next-generation sequencing or DNA microarrays lack precision, as they return hundreds of genes for a single disease profile. Several computational methods applied to physical interaction of protein networks have been successfully used in identification of the best disease candidates for each expression profile. An open problem for these methods is the ability to combine and take advantage of the wealth of biomedical data publicly available. We propose an enhanced method to improve selection of the best disease targets for a multilayer biomedical network that integrates PPI data annotated with stable knowledge from OMIM diseases and GO biological processes. We present a comprehensive validation that demonstrates the advantage of the proposed approach, Recursive Random Walk with Restarts (RecRWR). The obtained results outline the superiority of the proposed approach, RecRWR, in identifying disease candidates, especially with high levels of biological noise and benefiting from all data available.</p>
</abstract>
</article-meta>
</front>
<body>
<sec id="sec1">
<title>1. Introduction</title>
<p>A major research domain in molecular biology is the study of the causal association between genomic variations and clinical phenotypes [
<xref rid="B1" ref-type="bibr">1</xref>
<xref rid="B3" ref-type="bibr">3</xref>
]. Classical methods use a manual approach where one or a limited number of genomic targets are individually tested. However, due to the resources needed to systematically perform this procedure and due to the difficulty in controlling all experimental variables, improved strategies were required. The possibility to use computational methods to identify the best disease candidates to be further validated was a major breakthrough [
<xref rid="B4" ref-type="bibr">4</xref>
<xref rid="B8" ref-type="bibr">8</xref>
]. A common constraint of most methods is the need for training data, which is scarce and difficult to validate.</p>
<p>A recent research trend consists of exploiting the topological properties of protein-protein interaction (PPI) networks combined with other biological data to envisage the underlying mechanisms of genetic diseases. Barabási et al. [
<xref rid="B9" ref-type="bibr">9</xref>
] and Joy et al. [
<xref rid="B10" ref-type="bibr">10</xref>
] discuss the role of proteins with high betweenness as mediators of relevant metabolic processes. Ma and Zeng [
<xref rid="B11" ref-type="bibr">11</xref>
] explore the use of the closeness centrality to quickly identify the top central metabolites in large scale networks. Approaches proposed by Erten et al. [
<xref rid="B12" ref-type="bibr">12</xref>
] and Arrais and Oliveira [
<xref rid="B13" ref-type="bibr">13</xref>
] explore the potentialities of the nodes with high degree for the prioritization of disease-associated genes.</p>
<p>While the previous methods focus on evaluation of the weights given to each node, a complementary strategy consists of evaluating the proximity of two given nodes in the network. Some common methods to conduct this task are the shortest path, log-likelihood, propagation matrix, and the RWR (Random Walk with Restarts). Previous studies confirm that the RWR outperforms other methods [
<xref rid="B14" ref-type="bibr">14</xref>
<xref rid="B18" ref-type="bibr">18</xref>
]. One common limitation of these studies is that they assume the graph is single concept, meaning that every node is equally treated. However, as we demonstrate in this study, those methods are poor when the graph integrates nodes from distinct data types.</p>
<p>In this paper, we propose a novel method to improve selection of the best disease targets for multiconcept graphs. Towards this aim we build a multilayer biomedical graph that stores PPI data, annotated with stable knowledge from OMIM diseases and Biological Process from Gene Ontology. The inherent improvements of the proposed method are (a) the use of multilayer networks formed with PPI data and by the terms' associations; (b) the combination of this data to establish new associations among nodes; and (c) the use of degree-based methods to evaluate node weights.</p>
<p>Finally, we present comprehensive validation that demonstrates the superiority of the proposed approach, Recursive Random Walk with Restarts (RecRWR).</p>
</sec>
<sec id="sec2">
<title>2. Methods</title>
<p>The method proposed herein uses a graph representation of biomedical knowledge centred on proteins enriched with biomedical terms. The first step consisted of selecting and curating the required data and using it to construct the graph. For performance issues this network is represented as a matrix of adjacencies. Based on this ground-based biomedical graph we apply a modified version of the Hubs and Authorities (HITS) [
<xref rid="B13" ref-type="bibr">13</xref>
] algorithm adapted to this particular subject, in order to obtain a normalized and more accurate association among relations. Although here we are interested in tuning to protein-disease association, it is important to stress that it can be extended to the study of general association of many-to-many biomedical terms. Finally, we formulate how the proposed method, RecRWR, can be applied to this subject.</p>
<sec id="sec2.1">
<title>2.1. Multiconcept Graph Modelling</title>
<p>A graph-based representation is used to store the relations among the biomedical terms. Since we are integrating three distinct data sources, three interconnected subgraphs are obtained.
<list list-type="roman-lower">
<list-item>
<p>PPI data are retrieved from STRING database [
<xref rid="B19" ref-type="bibr">19</xref>
], where the average confidence level is considered. A filter is applied to select only human.</p>
</list-item>
<list-item>
<p>Disease data are extracted from OMIM morbid map [
<xref rid="B20" ref-type="bibr">20</xref>
] data where the genotype-phenotype associations are preserved. The morbid map is also used to extract the mapping relation for known protein diseases.</p>
</list-item>
<list-item>
<p>Biological Process from Gene Ontology (GO) [
<xref rid="B21" ref-type="bibr">21</xref>
] Directed Acyclic Graph structure is extracted and replicated. The GO-GO mapping is also retrieved and stored.</p>
</list-item>
</list>
</p>
<p>For each of the previous data sources a curated set of terms
<inline-formula>
<inline-formula>
<mml:math id="M1">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>
</inline-formula>
is extracted:
<disp-formula id="EEq1">
<label>(1)</label>
<mml:math id="M2">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:mfenced separators="|">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn mathvariant="normal">1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn mathvariant="normal">2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
with
<italic>a</italic>
<sub>
<italic>k</italic>
</sub>
representing the content of the
<italic>a</italic>
th term from the interval (
<italic>k</italic>
∈ [1,
<italic>P</italic>
] ⊂ ℕ). Each term
<italic>a</italic>
<sub>
<italic>k</italic>
</sub>
is a tuple of three elements that can be represented as
<disp-formula id="EEq2">
<label>(2)</label>
<mml:math id="M3">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:mfenced open="{" close="}" separators="|">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
where the element
<italic>t</italic>
<sub>
<italic>a</italic>
</sub>
has an association with the element
<italic>t</italic>
<sub>
<italic>b</italic>
</sub>
, with a confidence score
<italic>w</italic>
where (
<italic>a</italic>
,
<italic>b</italic>
∈ [1,
<italic>R</italic>
] ⊂ ℕ) and (
<italic>w</italic>
∈ [0,1000] ⊂ ℕ).</p>
<p>The set of vectors
<inline-formula>
<inline-formula>
<mml:math id="M4">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>
</inline-formula>
are modelled as a nonoriented weighted graph
<italic>G</italic>
= (
<italic>V</italic>
,
<italic>E</italic>
,
<italic>W</italic>
).
<list list-type="roman-lower">
<list-item>
<p>Each vertex
<italic>v</italic>
<sub>
<italic>x</italic>
</sub>
<italic>V</italic>
is obtained by identifying the unique entry
<italic>t</italic>
<sub>
<italic>a</italic>
</sub>
-
<italic>t</italic>
<sub>
<italic>b</italic>
</sub>
or
<italic>t</italic>
<sub>
<italic>b</italic>
</sub>
-
<italic>t</italic>
<sub>
<italic>a</italic>
</sub>
of all the association tuples contained in vector
<inline-formula>
<inline-formula>
<mml:math id="M5">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>
</inline-formula>
. The vertices are labelled by their unique identifier.</p>
</list-item>
<list-item>
<p>Each edge
<italic>e</italic>
<sub>
<italic>x</italic>
</sub>
<italic>E</italic>
connects vertexes (
<italic>v</italic>
<sub>
<italic>m</italic>
</sub>
,
<italic>v</italic>
<sub>
<italic>n</italic>
</sub>
) representing an association between the terms represented by the vertices
<italic>v</italic>
<sub>
<italic>m</italic>
</sub>
and
<italic>v</italic>
<sub>
<italic>n</italic>
</sub>
contained in vector
<inline-formula>
<inline-formula>
<mml:math id="M6">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>
</inline-formula>
.</p>
</list-item>
<list-item>
<p>The weight
<italic>w</italic>
<sub>
<italic>v</italic>
<sub>
<italic>m</italic>
</sub>
,
<italic>v</italic>
<sub>
<italic>n</italic>
</sub>
</sub>
of each edge
<italic>e</italic>
<sub>
<italic>x</italic>
</sub>
corresponds to the score between two nodes.</p>
</list-item>
</list>
</p>
<p>The graph
<italic>G</italic>
= (
<italic>V</italic>
,
<italic>E</italic>
,
<italic>W</italic>
) is then mapped to an adjacency matrix representation that consists of a |
<italic>V</italic>
| × |
<italic>V</italic>
| =
<italic>n</italic>
×
<italic>n</italic>
matrix
<italic>A</italic>
:
<disp-formula id="EEq3">
<label>(3)</label>
<mml:math id="M7">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mi>A</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfenced separators="|">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn mathvariant="normal">11</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
<mml:mtd>
<mml:mo></mml:mo>
</mml:mtd>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn mathvariant="normal">1</mml:mn>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo></mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mo></mml:mo>
</mml:mtd>
<mml:mtd>
<mml:mo></mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mn mathvariant="normal">1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
<mml:mtd>
<mml:mo></mml:mo>
</mml:mtd>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfenced open="|" close="|" separators="|">
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>  </mml:mi>
<mml:mo></mml:mo>
<mml:mfenced open="[" close="]" separators="|">
<mml:mrow>
<mml:mn mathvariant="normal">0,1000</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mo></mml:mo>
<mml:mi mathvariant="double-struck">N</mml:mi>
<mml:mo>.</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
Because the graph is undirected the adjacency matrix is symmetric and therefore
<italic>a</italic>
<sub>
<italic>ij</italic>
</sub>
=
<italic>a</italic>
<sub>
<italic>ji</italic>
</sub>
.</p>
<p>The compiled graph resulted in 60.000 nodes with an average degree of 5. The memory space required to represent the graph is Θ(|
<italic>E</italic>
|), which is realistically equivalent to a memory space of 6.0 MB, excluding hash tables required for node mapping. The adjacency matrix requires a memory space of Θ(|
<italic>V</italic>
|
<sup>2</sup>
), 7.2 GB.</p>
</sec>
<sec id="sec2.2">
<title>2.2. RecRWR: Recursive Random Walk with Restarts</title>
<p>Next we formulate the RecRWR algorithm including a detailed pseudocode description of the algorithm (
<xref ref-type="fig" rid="alg1">Algorithm 1</xref>
). The three main components are
<list list-type="roman-lower">
<list-item>
<p>Random Walk with Restarts;</p>
</list-item>
<list-item>
<p>recursive cross subgraph mapping;</p>
</list-item>
<list-item>
<p>node replacement.</p>
</list-item>
</list>
</p>
</sec>
<sec id="sec2.3">
<title>2.3. Random Walk with Restarts</title>
<p>The final probability vector of Random Walker is defined as
<disp-formula id="EEq4">
<label>(4)</label>
<mml:math id="M8">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn mathvariant="normal">1</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mfenced separators="|">
<mml:mrow>
<mml:mn mathvariant="normal">1</mml:mn>
<mml:mo></mml:mo>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mi>W</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>+</mml:mo>
<mml:mi>r</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mn mathvariant="normal">0</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
where
<italic>W</italic>
is the column-normalized adjacency matrix
<italic>A</italic>
and
<inline-formula>
<inline-formula>
<mml:math id="M9">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</inline-formula>
is a vector in which the
<italic>i</italic>
th element holds the probability of being at node
<italic>i</italic>
at time step
<italic>t</italic>
. The vector
<italic>p</italic>
<sup>0</sup>
holds the probability of the initial states and is constructed such that equal probabilities are assigned to the list of seed nodes where the sum of the probabilities is equal to 1. This is obtained by a given list
<italic>L</italic>
of seed nodes, where
<italic>L</italic>
<italic>V</italic>
.</p>
</sec>
<sec id="sec2.4">
<title>2.4. Recursive Cross Subgraph Mapping</title>
<p>We extend the previous formulation to a symmetric matrix composed of
<italic>k</italic>
<sup>2</sup>
/2 of submatrixes, where
<italic>k</italic>
corresponds to the number of data sources. The submatrix that corresponds to the mapping between the subgraphs
<italic>i</italic>
and
<italic>j</italic>
is obtained by
<disp-formula id="EEq5">
<label>(5)</label>
<mml:math id="M10">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>diag</mml:mi>
<mml:mo></mml:mo>
<mml:mfenced separators="|">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
</mml:mfenced>
<mml:mi></mml:mi>
<mml:mi>W</mml:mi>
<mml:mi></mml:mi>
<mml:mi>diag</mml:mi>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
where
<inline-formula>
<inline-formula>
<mml:math id="M11">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>
</inline-formula>
and
<inline-formula>
<inline-formula>
<mml:math id="M12">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>
</inline-formula>
are binary vectors with n elements that represent the mask of the source and target subgraphs where
<inline-formula>
<inline-formula>
<mml:math id="M13">
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mi></mml:mi>
<mml:mo></mml:mo>
<mml:mi></mml:mi>
<mml:mfenced open="[" close="]" separators="|">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi></mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo></mml:mo>
<mml:mi mathvariant="double-struck">N</mml:mi>
<mml:mo>,</mml:mo>
<mml:mfenced open="|" close="|" separators="|">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mfenced open="|" close="|" separators="|">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mi>n</mml:mi>
</mml:math>
</inline-formula>
</inline-formula>
.</p>
<p>The result of each iteration of the Random Walk with Restarts is given by
<disp-formula id="EEq6">
<label>(6)</label>
<mml:math id="M14">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo></mml:mo>
<mml:mi></mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo stretchy="false"></mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn mathvariant="normal">0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi></mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mrow>
<mml:mfenced separators="|">
<mml:mrow>
<mml:mfenced separators="|">
<mml:mrow>
<mml:mn mathvariant="normal">1</mml:mn>
<mml:mo></mml:mo>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:msub>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>+</mml:mo>
<mml:mi>r</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn mathvariant="normal">0</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
where in fact the algorithm stabilizes when the following condition is met:
<disp-formula id="EEq7">
<label>(7)</label>
<mml:math id="M15">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mfenced separators="|">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn mathvariant="normal">1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
<mml:mfenced open="|" close="|" separators="|">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo></mml:mo>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo></mml:mo>
<mml:mn mathvariant="normal">1</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo><</mml:mo>
<mml:mi>ϱ</mml:mi>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</disp-formula>
where
<inline-formula>
<inline-formula>
<mml:math id="M16">
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
</mml:math>
</inline-formula>
</inline-formula>
is disease mask vector and
<inline-formula>
<inline-formula>
<mml:math id="M17">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mo></mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</inline-formula>
</inline-formula>
is weight vector at time
<italic>t</italic>
. The product will result in a scalar that corresponds to the sum of the differences between two iterations. The condition is true when this value is less than a given constant
<italic>ϱ</italic>
.</p>
</sec>
<sec id="sec2.5">
<title>2.5. Node Replacement</title>
<p>The recursive iteration of the cross subgraph mapping returns a new term. A node replacement strategy is used to replace the genes to be used. The selection of the node index
<italic>a</italic>
<sub>
<italic>m</italic>
</sub>
to be replaced by the node index
<italic>a</italic>
<sub>
<italic>n</italic>
</sub>
is given by the minimum value of
<italic>a</italic>
<sub>
<italic>m</italic>
</sub>
= min⁡
<sub>0≤
<italic>m</italic>
≤|
<italic>V</italic>
|</sub>
<italic>p</italic>
<sub>
<italic>i</italic>
</sub>
<sup>
<italic>t</italic>
<italic></italic>
</sup>
and where the candidate node is given by
<italic>a</italic>
<sub>
<italic>n</italic>
</sub>
= max⁡
<sub>0≤
<italic>i</italic>
≤|
<italic>V</italic>
|</sub>
<italic>W</italic>
<sub>
<italic>ij</italic>
</sub>
<italic>p</italic>
<sub>
<italic>j</italic>
</sub>
<sup>
<italic>t</italic>
<italic></italic>
</sup>
.</p>
</sec>
</sec>
<sec id="sec3">
<title>3. Results and Discussion</title>
<p>In this section we explore and evaluate the performance of the proposed method. We present a systematic evaluation using a synthetic datasets based on well-known disease profiles. We also present how the results of RecRWR can be used to explore the resemblances mechanisms on breast cancer.</p>
<sec id="sec3.1">
<title>3.1. Validation Procedure</title>
<p>For each selected phenotype entry on the OMIM database we created a dataset with the associated genotypes. We have selected 100 phenotype diseases with at least 10 associated genotypes each. Then, we iteratively replace genes from the original dataset by random ones, in 20% increments, and the dataset is progressively converted to a fully random dataset. We use each of these protein datasets as seed nodes on the graph. We end up with a test space of 600 gene sets (6 random step levels plus 100 diseases).</p>
</sec>
<sec id="sec3.2">
<title>3.2. Information Paradox</title>
<p>Previous use of RWR on molecular biology typically concentrates on PPI networks. One would expect that including additional data would contribute to an improved overall result.
<xref ref-type="fig" rid="fig1"> Figure 1</xref>
presents a comparison of the relative frequency of the ranks for each of the analysed datasets, for two of the tested methods (RWR over only PPI data and RWR over the whole network) and for four levels of randomness. From analysis of this graph it is clear there is no improvement with including external annotations on the original PPI network. Indeed for original dataset, with random effect, there are no perceptible differences between the two methods. This statement is even sharper when we test progressive levels of randomness. For instance, when 20% of the genes on the dataset are random, 55% of the RWR over PPI ranks the disease in the top 3, while with the RWR over all data this frequency drops to 48%. For 60% randomness, 35% of the RWR over PPI ranks the disease in the top 5, while with the RWR over all data the frequency drops to 23%. These results were the primary motivation for the work presented in this paper, as they clearly show that the RWR method is not suitable for dealing with multiple biological data.</p>
</sec>
<sec id="sec3.3">
<title>3.3. RecRWR Results on Synthetic Datasets</title>
<p>We evaluate the performance of the RecRWR method using the receiver operating characteristic (ROC) curves where each curve contains the results for each level of randomness. A higher AUC (area under curve) corresponds to a better overall performance.
<xref ref-type="fig" rid="fig2"> Figure 2</xref>
and
<xref ref-type="table" rid="tab1">Table 1</xref>
compile the obtained results.</p>
<p>With 0% randomness the AUC is approximately the same for the three methods, the proposed one having the lowest minimum value, which can be perceived visually. This means that in the absence of biological noise the protein annotation data does not contribute to improving the final result. However, if randomness is introduced the proposed method shows a strong improvement.</p>
<p>With 20% randomness the RecRWR AUC is 0.9834, which compared to 0.9453 on the RWR corresponds to a 4.0% of improvement. Comparing the behaviour of the RecRWR the 20% randomness reflects no real impact (−0.22%) on the obtained AUC.</p>
<p>For 40% and 60% the difference is even higher (7.1% for 40% and 7.6% for 60%) demonstrating the greater capability of the proposed method.</p>
<p>It is also relevant to note a 1.0 TPR (true positive rate,
<italic>y</italic>
-axis on the graphs from
<xref ref-type="fig" rid="fig2">Figure 2</xref>
), meaning that the disease is always correctly identified and is consistently obtained at the expense of a lower FPR (false positive rate,
<italic>x</italic>
-axis).</p>
</sec>
<sec id="sec3.4">
<title>3.4. RecRWR Results on Breast Cancer</title>
<p>Breast cancer (MIM:114480) is considered a complex disorder having 23 known genotypes that are shared with other cancer-related disorders. We have used RecRWR over the common expression profiles of breast cancer to explore the network of diseases that share common mechanism. The diseases most closely related to breast cancer are hepatocellular carcinoma, bladder cancer, and lung cancer.</p>
<p>From analysing the network of associations, we can see that the proteins most related with breast cancer are responsible for important cellular functions, such as DNA repairing, cell cycle arrest and its regulation, induced cell death (apoptosis), and tumor suppression. Also, we can see that the more closely GO terms associated with breast cancer are protein binding and apoptotic process. This means that the probable causes of breast cancer are related to the impairment of all these functions. For instance, a genetic mutation causing loss of function on a tumor suppressor gene (such as the cellular tumor antigen p53, P04637) product would result in unrestrained cellular proliferation. Conversely, the transformation of a protooncogene (a gene that participates in a cell-growth pathway) into an oncogene (a protein that can induce cancer on animals) requires a gain-of-function mutation that will allow its permanent activation. An example of this is the epidermal growth factor receptor (EGFR, P00533), also present in
<xref ref-type="fig" rid="fig3">Figure 3</xref>
. EGFR is involved in the conversion of extracellular stimulus to cellular responses. Also, transcription errors are usually immediately corrected by DNA repairing proteins, like the DNA repair and recombination protein RAD54-like (Q92698), shown in the network. A mutation in this gene would result in the defective proteins, and subsequently the correction of transcription and translation errors would cease. Finally, the protein caspase-8 also seems to be a possible cause of breast cancer. Since caspase-8 is involved in the apoptotic process, impairment of this protein would result in the absence of apoptosis, and defective cells would not be destroyed.</p>
<p>The shortest path between the two diseases is mediated by the cellular tumor antigen p53. There are however other connections between the two nodes. For instance, the proteins receptor tyrosine-protein kinase erbB-2 (P04626), GTPase KRas (P01116), and caspase-8 (Q14790) also connect the two cancers. The influence of caspase-8 mutations on the onset of cancer was explained above. ERBB2 is a protooncogene, with the potential of being converted into an oncogene and inducing cancer. The GTPase KRas is involved in a great variety of important biological processes, including regulation of both of cell proliferation and gene expression, signal transduction, and cell signalling. The majority of the proteins analysed here are part of the same KEGG pathways: pathways in cancer (hsa05200), neurotrophin signaling pathway (hsa04722), and focal adhesion (hsa04510). The first pathway consists of an integration of the various cancer pathways. The neurotrophin signalling pathway is responsible for the differentiation and survival of neural cells. However, this second pathway is heavily regulated by other intracellular signalling cascades, in which some of the proteins presented in
<xref ref-type="fig" rid="fig3">Figure 3</xref>
participate. The focal adhesion pathway plays important roles in the proliferation, differentiation, and survival of cells and in gene expression. In case of compromise of any of the proteins involved on this pathway cellular communication becomes defective, which can also result in cancer.</p>
</sec>
</sec>
<sec id="sec4">
<title>4. Conclusion</title>
<p>In this paper, we have proposed a graph-based approach to address the problem of selecting the best disease targets for multiconcept graphs. Towards this aim we build a multilayer biomedical graph that stores PPI data, annotated with stable knowledge from OMIM diseases and Biological Process from Gene Ontology. The inherent improvements of the proposed method are the use of multilayer networks formed with the PPI data and by the terms' associations; combination of this data to establish new associations among nodes; and use of degree-based methods for evaluating node weights.</p>
<p>Finally, we have presented comprehensive validation that demonstrates the superiority of the proposed approach, Recursive Random Walk with Restarts (RecRWR). The obtained results outline the superiority of the proposed approach in identifying disease candidates, especially with high levels of biological noise and benefiting from all data available.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>This work has received support from the RD-CONNECT European Project (EC contract no. 305444). Research Unit IEETA is funded by National Funds through FCT, Foundation for Science and Technology, in the context of the Project PxEst-OE/EEI/UI0127/2014.</p>
</ack>
<sec sec-type="conflict">
<title>Conflict of Interests</title>
<p>The authors declare that there is no conflict of interests regarding the publication of this paper.</p>
</sec>
<sec>
<title>Authors' Contribution</title>
<p>Joel P. Arrais and José Luís Oliveira contributed equally to the work presented here.</p>
</sec>
<ref-list>
<ref id="B1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Giallourakis</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Henson</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Reich</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Mootha</surname>
<given-names>V. K.</given-names>
</name>
</person-group>
<article-title>Disease gene discovery through integrative genomics</article-title>
<source>
<italic>Annual Review of Genomics and Human Genetics</italic>
</source>
<year>2005</year>
<volume>6</volume>
<fpage>381</fpage>
<lpage>406</lpage>
<pub-id pub-id-type="doi">10.1146/annurev.genom.6.080604.162234</pub-id>
<pub-id pub-id-type="other">2-s2.0-25844439570</pub-id>
</element-citation>
</ref>
<ref id="B2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brunner</surname>
<given-names>H. G.</given-names>
</name>
<name>
<surname>van Driel</surname>
<given-names>M. A.</given-names>
</name>
</person-group>
<article-title>From syndrome families to functional genomics</article-title>
<source>
<italic>Nature Reviews Genetics</italic>
</source>
<year>2004</year>
<volume>5</volume>
<issue>7</issue>
<fpage>545</fpage>
<lpage>551</lpage>
<pub-id pub-id-type="doi">10.1038/nrg1383</pub-id>
<pub-id pub-id-type="other">2-s2.0-3042841887</pub-id>
</element-citation>
</ref>
<ref id="B3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Auffray</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Hood</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>Systems medicine: the future of medical genomics and healthcare</article-title>
<source>
<italic>Genome Medicine</italic>
</source>
<year>2009</year>
<volume>1</volume>
<issue>1, article gm2</issue>
<pub-id pub-id-type="doi">10.1186/gm2</pub-id>
<pub-id pub-id-type="other">2-s2.0-77953397764</pub-id>
</element-citation>
</ref>
<ref id="B4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Aronow</surname>
<given-names>B. J.</given-names>
</name>
<name>
<surname>Jegga</surname>
<given-names>A. G.</given-names>
</name>
</person-group>
<article-title>Improved human disease candidate gene prioritization using mouse phenotype</article-title>
<source>
<italic>BMC Bioinformatics</italic>
</source>
<year>2007</year>
<volume>8, article 392</volume>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-392</pub-id>
<pub-id pub-id-type="other">2-s2.0-38049136610</pub-id>
</element-citation>
</ref>
<ref id="B5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aerts</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Lambrechts</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Maity</surname>
<given-names>S.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gene prioritization through genomic data fusion</article-title>
<source>
<italic>Nature Biotechnology</italic>
</source>
<year>2006</year>
<volume>24</volume>
<issue>5</issue>
<fpage>537</fpage>
<lpage>544</lpage>
<pub-id pub-id-type="doi">10.1038/nbt1203</pub-id>
<pub-id pub-id-type="other">2-s2.0-33646568805</pub-id>
</element-citation>
</ref>
<ref id="B6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rossi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Masotti</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Nardini</surname>
<given-names>C.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>TOM: a web-based integrated approach for identification of candidate disease genes</article-title>
<source>
<italic>Nucleic Acids Research</italic>
</source>
<year>2006</year>
<volume>34</volume>
<fpage>W285</fpage>
<lpage>W292</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkl340</pub-id>
<pub-id pub-id-type="other">2-s2.0-33747814487</pub-id>
<pub-id pub-id-type="pmid">16845011</pub-id>
</element-citation>
</ref>
<ref id="B7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moreau</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tranchevent</surname>
<given-names>L.-C.</given-names>
</name>
</person-group>
<article-title>Computational tools for prioritizing candidate genes: boosting disease gene discovery</article-title>
<source>
<italic>Nature Reviews Genetics</italic>
</source>
<year>2012</year>
<volume>13</volume>
<issue>8</issue>
<fpage>523</fpage>
<lpage>536</lpage>
<pub-id pub-id-type="doi">10.1038/nrg3253</pub-id>
<pub-id pub-id-type="other">2-s2.0-84863987708</pub-id>
</element-citation>
</ref>
<ref id="B8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arrais</surname>
<given-names>J. P.</given-names>
</name>
<name>
<surname>Fernandes</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Pereira</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Oliveira</surname>
<given-names>J. L.</given-names>
</name>
</person-group>
<article-title>GeneBrowser 2: an application to explore and identify common biological traits in a set of genes</article-title>
<source>
<italic>BMC Bioinformatics</italic>
</source>
<year>2010</year>
<volume>11, article 389</volume>
<pub-id pub-id-type="doi">10.1186/1471-2105-11-389</pub-id>
<pub-id pub-id-type="other">2-s2.0-77954675055</pub-id>
</element-citation>
</ref>
<ref id="B9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barabási</surname>
<given-names>A.-L.</given-names>
</name>
<name>
<surname>Gulbahce</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Loscalzo</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Network medicine: a network-based approach to human disease</article-title>
<source>
<italic>Nature Reviews Genetics</italic>
</source>
<year>2011</year>
<volume>12</volume>
<issue>1</issue>
<fpage>56</fpage>
<lpage>68</lpage>
<pub-id pub-id-type="doi">10.1038/nrg2918</pub-id>
<pub-id pub-id-type="other">2-s2.0-78650373804</pub-id>
</element-citation>
</ref>
<ref id="B10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Joy</surname>
<given-names>M. P.</given-names>
</name>
<name>
<surname>Brock</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ingber</surname>
<given-names>D. E.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>High-betweenness proteins in the yeast protein interaction network</article-title>
<source>
<italic>Journal of Biomedicine & Biotechnology</italic>
</source>
<year>2005</year>
<volume>2005</volume>
<issue>2</issue>
<fpage>96</fpage>
<lpage>103</lpage>
<pub-id pub-id-type="doi">10.1155/JBB.2005.96</pub-id>
<pub-id pub-id-type="other">2-s2.0-27744469130</pub-id>
<pub-id pub-id-type="pmid">16046814</pub-id>
</element-citation>
</ref>
<ref id="B11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ma</surname>
<given-names>H.-W.</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>A.-P.</given-names>
</name>
</person-group>
<article-title>The connectivity structure, giant strong component and centrality of metabolic networks</article-title>
<source>
<italic>Bioinformatics</italic>
</source>
<year>2003</year>
<volume>19</volume>
<issue>11</issue>
<fpage>1423</fpage>
<lpage>1430</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btg177</pub-id>
<pub-id pub-id-type="other">2-s2.0-0042093610</pub-id>
<pub-id pub-id-type="pmid">12874056</pub-id>
</element-citation>
</ref>
<ref id="B12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Erten</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Bebek</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Ewing</surname>
<given-names>R. M.</given-names>
</name>
<name>
<surname>Koyutürk</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>DADA: degree-aware algorithms for network-based disease gene prioritization</article-title>
<source>
<italic>BioData Mining</italic>
</source>
<year>2011</year>
<volume>4</volume>
<issue>1, article 19</issue>
<pub-id pub-id-type="doi">10.1186/1756-0381-4-19</pub-id>
<pub-id pub-id-type="other">2-s2.0-79959464683</pub-id>
</element-citation>
</ref>
<ref id="B13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arrais</surname>
<given-names>J. P.</given-names>
</name>
<name>
<surname>Oliveira</surname>
<given-names>J. L.</given-names>
</name>
</person-group>
<article-title>Using biomedical networks to prioritize gene–disease associations</article-title>
<source>
<italic>Open Access Bioinformatics</italic>
</source>
<year>2011</year>
<volume>2011</volume>
<issue>3</issue>
<fpage>123</fpage>
<lpage>130</lpage>
</element-citation>
</ref>
<ref id="B14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Macropol</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Can</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>A. K.</given-names>
</name>
</person-group>
<article-title>RRW: repeated random walks on genome-scale protein networks for local cluster discovery</article-title>
<source>
<italic>BMC Bioinformatics</italic>
</source>
<year>2009</year>
<volume>10, article 283</volume>
<pub-id pub-id-type="doi">10.1186/1471-2105-10-283</pub-id>
<pub-id pub-id-type="other">2-s2.0-70349754527</pub-id>
</element-citation>
</ref>
<ref id="B15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>K.</given-names>
</name>
</person-group>
<article-title>A method based on local density and random walks for complexes detection in protein interaction networks</article-title>
<source>
<italic>Journal of Bioinformatics and Computational Biology</italic>
</source>
<year>2010</year>
<volume>8</volume>
<issue>supplement 1</issue>
<fpage>47</fpage>
<lpage>62</lpage>
<pub-id pub-id-type="doi">10.1142/S0219720010005191</pub-id>
<pub-id pub-id-type="other">2-s2.0-78649813646</pub-id>
<pub-id pub-id-type="pmid">21155019</pub-id>
</element-citation>
</ref>
<ref id="B16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Le</surname>
<given-names>D.-H.</given-names>
</name>
<name>
<surname>Kwon</surname>
<given-names>Y.-K.</given-names>
</name>
</person-group>
<article-title>GPEC: A Cytoscape plug-in for random walk-based gene prioritization and biomedical evidence collection</article-title>
<source>
<italic>Computational Biology and Chemistry</italic>
</source>
<year>2012</year>
<volume>37</volume>
<fpage>17</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="doi">10.1016/j.compbiolchem.2012.02.004</pub-id>
<pub-id pub-id-type="other">2-s2.0-84858250709</pub-id>
<pub-id pub-id-type="pmid">22430954</pub-id>
</element-citation>
</ref>
<ref id="B17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Re</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Mesiti</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Valentini</surname>
<given-names>G.</given-names>
</name>
</person-group>
<article-title>A fast ranking algorithm for predicting gene functions in biomolecular networks</article-title>
<source>
<italic>IEEE/ACM Transactions on Computational Biology and Bioinformatics</italic>
</source>
<year>2012</year>
<volume>9</volume>
<issue>6</issue>
<fpage>1812</fpage>
<lpage>1818</lpage>
<pub-id pub-id-type="doi">10.1109/TCBB.2012.114</pub-id>
<pub-id pub-id-type="other">2-s2.0-84875276463</pub-id>
<pub-id pub-id-type="pmid">23221088</pub-id>
</element-citation>
</ref>
<ref id="B18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kohler</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Bauer</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Horn</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Robinson</surname>
<given-names>P. N.</given-names>
</name>
</person-group>
<article-title>Walking the interactome for prioritization of candidate disease genes</article-title>
<source>
<italic>The American Journal of Human Genetics</italic>
</source>
<year>2008</year>
<volume>82</volume>
<issue>4</issue>
<fpage>949</fpage>
<lpage>958</lpage>
<pub-id pub-id-type="doi">10.1016/j.ajhg.2008.02.013</pub-id>
<pub-id pub-id-type="other">2-s2.0-41549139527</pub-id>
</element-citation>
</ref>
<ref id="B19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Szklarczyk</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Franceschini</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kuhn</surname>
<given-names>M.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored</article-title>
<source>
<italic>Nucleic Acids Research</italic>
</source>
<year>2011</year>
<volume>39</volume>
<issue>1</issue>
<fpage>D561</fpage>
<lpage>D568</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkq973</pub-id>
<pub-id pub-id-type="other">2-s2.0-78651324347</pub-id>
<pub-id pub-id-type="pmid">21045058</pub-id>
</element-citation>
</ref>
<ref id="B20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hamosh</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Scott</surname>
<given-names>A. F.</given-names>
</name>
<name>
<surname>Amberger</surname>
<given-names>J. S.</given-names>
</name>
<name>
<surname>Bocchini</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>McKusick</surname>
<given-names>V. A.</given-names>
</name>
</person-group>
<article-title>Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders</article-title>
<source>
<italic>Nucleic Acids Research</italic>
</source>
<year>2005</year>
<volume>33</volume>
<fpage>D514</fpage>
<lpage>D517</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gki033</pub-id>
<pub-id pub-id-type="other">2-s2.0-13444266370</pub-id>
<pub-id pub-id-type="pmid">15608251</pub-id>
</element-citation>
</ref>
<ref id="B21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ashburner</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ball</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>Blake</surname>
<given-names>J. A.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gene ontology: tool for the unification of biology</article-title>
<source>
<italic>Nature Genetics</italic>
</source>
<year>2000</year>
<volume>25</volume>
<issue>1</issue>
<fpage>25</fpage>
<lpage>29</lpage>
<pub-id pub-id-type="other">2-s2.0-0034069495</pub-id>
<pub-id pub-id-type="doi">10.1038/75556</pub-id>
<pub-id pub-id-type="pmid">10802651</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
<floats-group>
<fig id="fig1" orientation="portrait" position="float">
<label>Figure 1</label>
<caption>
<p>Comparison of the RWR method using PPI data and PPI enriched in biological terms.</p>
</caption>
<graphic xlink:href="BMRI2015-747156.001"></graphic>
</fig>
<fig id="fig2" orientation="portrait" position="float">
<label>Figure 2</label>
<caption>
<p>ROC curves with the comparison of the overall performance of RecRWR against existent methods.</p>
</caption>
<graphic xlink:href="BMRI2015-747156.002"></graphic>
</fig>
<fig id="fig3" orientation="portrait" position="float">
<label>Figure 3</label>
<caption>
<p>Network of biological concepts associated with breast cancer.</p>
</caption>
<graphic xlink:href="BMRI2015-747156.003"></graphic>
</fig>
<fig id="alg1" orientation="portrait" position="float">
<label>Algorithm 1</label>
<caption>
<p>Pseudocode for the RecRWR method.</p>
</caption>
<graphic xlink:href="BMRI2015-747156.alg.001"></graphic>
</fig>
<table-wrap id="tab1" orientation="portrait" position="float">
<label>Table 1</label>
<caption>
<p>Comparison of the AUC for the analysed methods.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="center" rowspan="1" colspan="1">0%</th>
<th align="center" rowspan="1" colspan="1">20%</th>
<th align="center" rowspan="1" colspan="1">40%</th>
<th align="center" rowspan="1" colspan="1">60%</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">AUC-RWR</td>
<td align="center" rowspan="1" colspan="1">0.9866</td>
<td align="center" rowspan="1" colspan="1">0.9453</td>
<td align="center" rowspan="1" colspan="1">0.8894</td>
<td align="center" rowspan="1" colspan="1">0.8435</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ (%)</td>
<td align="center" rowspan="1" colspan="1"></td>
<td align="center" rowspan="1" colspan="1">−4.36%</td>
<td align="center" rowspan="1" colspan="1">−6.29%</td>
<td align="center" rowspan="1" colspan="1">−5.44%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">AUC-RWR all data</td>
<td align="center" rowspan="1" colspan="1">0.9866</td>
<td align="center" rowspan="1" colspan="1">0.9417</td>
<td align="center" rowspan="1" colspan="1">0.8838</td>
<td align="center" rowspan="1" colspan="1">0.8115</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ (%)</td>
<td align="center" rowspan="1" colspan="1"></td>
<td align="center" rowspan="1" colspan="1">−4.77%</td>
<td align="center" rowspan="1" colspan="1">−6.55%</td>
<td align="center" rowspan="1" colspan="1">−8.91%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">AUC-RecRWR</td>
<td align="center" rowspan="1" colspan="1">0.9856</td>
<td align="center" rowspan="1" colspan="1">0.9834</td>
<td align="center" rowspan="1" colspan="1">0.9534</td>
<td align="center" rowspan="1" colspan="1">0.9072</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ (%)</td>
<td align="center" rowspan="1" colspan="1"></td>
<td align="center" rowspan="1" colspan="1">−0.22%</td>
<td align="center" rowspan="1" colspan="1">−3.15%</td>
<td align="center" rowspan="1" colspan="1">−5.09%</td>
</tr>
</tbody>
</table>
</table-wrap>
</floats-group>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000437 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000437 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    TelematiV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4385608
   |texte=   RecRWR: A Recursive Random Walk Method for Improved Identification of Diseases
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:25874227" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a TelematiV1 

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Thu Nov 2 16:09:04 2017. Site generation: Sun Mar 10 16:42:28 2024