La maladie de Parkinson au Canada (serveur d'exploration)

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends

Identifieur interne : 000243 ( Pmc/Corpus ); précédent : 000242; suivant : 000244

Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends

Auteurs : Gabriela Jurca ; Omar Addam ; Alper Aksac ; Shang Gao ; Tansel Özyer ; Douglas Demetrick ; Reda Alhajj

Source :

RBID : PMC:4845430

Abstract

Background

Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer.

Results

We utilized PubMed for the testing. We investigated gene–gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries.

Conclusions

Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene–gene relations and gene functions.

Electronic supplementary material

The online version of this article (doi:10.1186/s13104-016-2023-5) contains supplementary material, which is available to authorized users.


Url:
DOI: 10.1186/s13104-016-2023-5
PubMed: 27112211
PubMed Central: 4845430

Links to Exploration step

PMC:4845430

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends</title>
<author>
<name sortKey="Jurca, Gabriela" sort="Jurca, Gabriela" uniqKey="Jurca G" first="Gabriela" last="Jurca">Gabriela Jurca</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science, University of Calgary, Calgary, AB Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Addam, Omar" sort="Addam, Omar" uniqKey="Addam O" first="Omar" last="Addam">Omar Addam</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science, University of Calgary, Calgary, AB Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Aksac, Alper" sort="Aksac, Alper" uniqKey="Aksac A" first="Alper" last="Aksac">Alper Aksac</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science, University of Calgary, Calgary, AB Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gao, Shang" sort="Gao, Shang" uniqKey="Gao S" first="Shang" last="Gao">Shang Gao</name>
<affiliation>
<nlm:aff id="Aff2">College of Computer Science and Technology, Jilin University, Changchun, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ozyer, Tansel" sort="Ozyer, Tansel" uniqKey="Ozyer T" first="Tansel" last="Özyer">Tansel Özyer</name>
<affiliation>
<nlm:aff id="Aff3">Department of Computer Engineering, TOBB University, Ankara, Turkey</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Demetrick, Douglas" sort="Demetrick, Douglas" uniqKey="Demetrick D" first="Douglas" last="Demetrick">Douglas Demetrick</name>
<affiliation>
<nlm:aff id="Aff4">Departments of Pathology, Oncology and Biochemistry & Molecular Biology, University of Calgary, Calgary, AB Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Alhajj, Reda" sort="Alhajj, Reda" uniqKey="Alhajj R" first="Reda" last="Alhajj">Reda Alhajj</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science, University of Calgary, Calgary, AB Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Department of Computer Science, Global University, Beirut, Lebanon</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">27112211</idno>
<idno type="pmc">4845430</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4845430</idno>
<idno type="RBID">PMC:4845430</idno>
<idno type="doi">10.1186/s13104-016-2023-5</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000243</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000243</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends</title>
<author>
<name sortKey="Jurca, Gabriela" sort="Jurca, Gabriela" uniqKey="Jurca G" first="Gabriela" last="Jurca">Gabriela Jurca</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science, University of Calgary, Calgary, AB Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Addam, Omar" sort="Addam, Omar" uniqKey="Addam O" first="Omar" last="Addam">Omar Addam</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science, University of Calgary, Calgary, AB Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Aksac, Alper" sort="Aksac, Alper" uniqKey="Aksac A" first="Alper" last="Aksac">Alper Aksac</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science, University of Calgary, Calgary, AB Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gao, Shang" sort="Gao, Shang" uniqKey="Gao S" first="Shang" last="Gao">Shang Gao</name>
<affiliation>
<nlm:aff id="Aff2">College of Computer Science and Technology, Jilin University, Changchun, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ozyer, Tansel" sort="Ozyer, Tansel" uniqKey="Ozyer T" first="Tansel" last="Özyer">Tansel Özyer</name>
<affiliation>
<nlm:aff id="Aff3">Department of Computer Engineering, TOBB University, Ankara, Turkey</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Demetrick, Douglas" sort="Demetrick, Douglas" uniqKey="Demetrick D" first="Douglas" last="Demetrick">Douglas Demetrick</name>
<affiliation>
<nlm:aff id="Aff4">Departments of Pathology, Oncology and Biochemistry & Molecular Biology, University of Calgary, Calgary, AB Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Alhajj, Reda" sort="Alhajj, Reda" uniqKey="Alhajj R" first="Reda" last="Alhajj">Reda Alhajj</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science, University of Calgary, Calgary, AB Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Department of Computer Science, Global University, Beirut, Lebanon</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Research Notes</title>
<idno type="eISSN">1756-0500</idno>
<imprint>
<date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer.</p>
</sec>
<sec>
<title>Results</title>
<p>We utilized PubMed for the testing. We investigated gene–gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene–gene relations and gene functions.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s13104-016-2023-5) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Veer, Lj" uniqKey="Van Veer L">LJ van’t Veer</name>
</author>
<author>
<name sortKey="Bernards, R" uniqKey="Bernards R">R Bernards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mishra, Alok" uniqKey="Mishra A">Alok Mishra</name>
</author>
<author>
<name sortKey="Verma, Mukesh" uniqKey="Verma M">Mukesh Verma</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ozgur, A" uniqKey="Ozgur A">A Ozgür</name>
</author>
<author>
<name sortKey="Vu, T" uniqKey="Vu T">T Vu</name>
</author>
<author>
<name sortKey="Erkan, G" uniqKey="Erkan G">G Erkan</name>
</author>
<author>
<name sortKey="Radev, Dr" uniqKey="Radev D">DR Radev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author>
<name sortKey="Navathe, Sb" uniqKey="Navathe S">SB Navathe</name>
</author>
<author>
<name sortKey="Civera, J" uniqKey="Civera J">J Civera</name>
</author>
<author>
<name sortKey="Dasigi, V" uniqKey="Dasigi V">V Dasigi</name>
</author>
<author>
<name sortKey="Ram, A" uniqKey="Ram A">A Ram</name>
</author>
<author>
<name sortKey="Ciliax, Bj" uniqKey="Ciliax B">BJ Ciliax</name>
</author>
<author>
<name sortKey="Dingledine, R" uniqKey="Dingledine R">R Dingledine</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Otte, E" uniqKey="Otte E">E Otte</name>
</author>
<author>
<name sortKey="Rousseau, R" uniqKey="Rousseau R">R Rousseau</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Akay, Mf" uniqKey="Akay M">MF Akay</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Faro, A" uniqKey="Faro A">A Faro</name>
</author>
<author>
<name sortKey="Giordano, D" uniqKey="Giordano D">D Giordano</name>
</author>
<author>
<name sortKey="Spampinato, C" uniqKey="Spampinato C">C Spampinato</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhu, F" uniqKey="Zhu F">F Zhu</name>
</author>
<author>
<name sortKey="Patumcharoenpol, P" uniqKey="Patumcharoenpol P">P Patumcharoenpol</name>
</author>
<author>
<name sortKey="Zhang, C" uniqKey="Zhang C">C Zhang</name>
</author>
<author>
<name sortKey="Yang, Y" uniqKey="Yang Y">Y Yang</name>
</author>
<author>
<name sortKey="Chan, J" uniqKey="Chan J">J Chan</name>
</author>
<author>
<name sortKey="Meechai, A" uniqKey="Meechai A">A Meechai</name>
</author>
<author>
<name sortKey="Vongsangnak, W" uniqKey="Vongsangnak W">W Vongsangnak</name>
</author>
<author>
<name sortKey="Shen, B" uniqKey="Shen B">B Shen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rebholz Schuhmann, D" uniqKey="Rebholz Schuhmann D">D Rebholz-Schuhmann</name>
</author>
<author>
<name sortKey="Oellrich, A" uniqKey="Oellrich A">A Oellrich</name>
</author>
<author>
<name sortKey="Hoehndorf, R" uniqKey="Hoehndorf R">R Hoehndorf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jensen, Lj" uniqKey="Jensen L">LJ Jensen</name>
</author>
<author>
<name sortKey="Saric, J" uniqKey="Saric J">J Saric</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nunes, T" uniqKey="Nunes T">T Nunes</name>
</author>
<author>
<name sortKey="Campos, D" uniqKey="Campos D">D Campos</name>
</author>
<author>
<name sortKey="Matos, S" uniqKey="Matos S">S Matos</name>
</author>
<author>
<name sortKey="Oliveira, Jl" uniqKey="Oliveira J">JL Oliveira</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stears, R" uniqKey="Stears R">R Stears</name>
</author>
<author>
<name sortKey="Martinsky, T" uniqKey="Martinsky T">T Martinsky</name>
</author>
<author>
<name sortKey="Schena, M" uniqKey="Schena M">M Schena</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Warde Farley, D" uniqKey="Warde Farley D">D Warde-Farley</name>
</author>
<author>
<name sortKey="Donaldson, Sl" uniqKey="Donaldson S">SL Donaldson</name>
</author>
<author>
<name sortKey="Comes, O" uniqKey="Comes O">O Comes</name>
</author>
<author>
<name sortKey="Zuberi, K" uniqKey="Zuberi K">K Zuberi</name>
</author>
<author>
<name sortKey="Badrawi, R" uniqKey="Badrawi R">R Badrawi</name>
</author>
<author>
<name sortKey="Chao, P" uniqKey="Chao P">P Chao</name>
</author>
<author>
<name sortKey="Franz, M" uniqKey="Franz M">M Franz</name>
</author>
<author>
<name sortKey="Grouios, C" uniqKey="Grouios C">C Grouios</name>
</author>
<author>
<name sortKey="Kazi, F" uniqKey="Kazi F">F Kazi</name>
</author>
<author>
<name sortKey="Lopes, Ct" uniqKey="Lopes C">CT Lopes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bauer Mehren, A" uniqKey="Bauer Mehren A">A Bauer-Mehren</name>
</author>
<author>
<name sortKey="Bundschus, M" uniqKey="Bundschus M">M Bundschus</name>
</author>
<author>
<name sortKey="Rautschka, M" uniqKey="Rautschka M">M Rautschka</name>
</author>
<author>
<name sortKey="Mayer, Ma" uniqKey="Mayer M">MA Mayer</name>
</author>
<author>
<name sortKey="Sanz, F" uniqKey="Sanz F">F Sanz</name>
</author>
<author>
<name sortKey="Furlong, Li" uniqKey="Furlong L">LI Furlong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bauer Mehren, A" uniqKey="Bauer Mehren A">A Bauer-Mehren</name>
</author>
<author>
<name sortKey="Rautschka, M" uniqKey="Rautschka M">M Rautschka</name>
</author>
<author>
<name sortKey="Sanz, F" uniqKey="Sanz F">F Sanz</name>
</author>
<author>
<name sortKey="Furlong, Li" uniqKey="Furlong L">LI Furlong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Osborne, Jd" uniqKey="Osborne J">JD Osborne</name>
</author>
<author>
<name sortKey="Flatow, J" uniqKey="Flatow J">J Flatow</name>
</author>
<author>
<name sortKey="Holko, M" uniqKey="Holko M">M Holko</name>
</author>
<author>
<name sortKey="Lin, Sm" uniqKey="Lin S">SM Lin</name>
</author>
<author>
<name sortKey="Kibbe, Wa" uniqKey="Kibbe W">WA Kibbe</name>
</author>
<author>
<name sortKey="Zhu, Lj" uniqKey="Zhu L">LJ Zhu</name>
</author>
<author>
<name sortKey="Danila, Mi" uniqKey="Danila M">MI Danila</name>
</author>
<author>
<name sortKey="Feng, J" uniqKey="Feng J">J Feng</name>
</author>
<author>
<name sortKey="Chisholm, Rl" uniqKey="Chisholm R">RL Chisholm</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Spampinato, C" uniqKey="Spampinato C">C Spampinato</name>
</author>
<author>
<name sortKey="Kavasidis, I" uniqKey="Kavasidis I">I Kavasidis</name>
</author>
<author>
<name sortKey="Aldinucci, M" uniqKey="Aldinucci M">M Aldinucci</name>
</author>
<author>
<name sortKey="Pino, C" uniqKey="Pino C">C Pino</name>
</author>
<author>
<name sortKey="Giordano, D" uniqKey="Giordano D">D Giordano</name>
</author>
<author>
<name sortKey="Faro, A" uniqKey="Faro A">A Faro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ozgur, A" uniqKey="Ozgur A">A Ozgür</name>
</author>
<author>
<name sortKey="Xiang, Z" uniqKey="Xiang Z">Z Xiang</name>
</author>
<author>
<name sortKey="Radev, Dr" uniqKey="Radev D">DR Radev</name>
</author>
<author>
<name sortKey="He, Y" uniqKey="He Y">Y He</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hudis, Ca" uniqKey="Hudis C">CA Hudis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vogel, Cl" uniqKey="Vogel C">CL Vogel</name>
</author>
<author>
<name sortKey="Cobleigh, Ma" uniqKey="Cobleigh M">MA Cobleigh</name>
</author>
<author>
<name sortKey="Tripathy, D" uniqKey="Tripathy D">D Tripathy</name>
</author>
<author>
<name sortKey="Gutheil, Jc" uniqKey="Gutheil J">JC Gutheil</name>
</author>
<author>
<name sortKey="Harris, Ln" uniqKey="Harris L">LN Harris</name>
</author>
<author>
<name sortKey="Fehrenbacher, L" uniqKey="Fehrenbacher L">L Fehrenbacher</name>
</author>
<author>
<name sortKey="Slamon, Dj" uniqKey="Slamon D">DJ Slamon</name>
</author>
<author>
<name sortKey="Murphy, M" uniqKey="Murphy M">M Murphy</name>
</author>
<author>
<name sortKey="Novotny, Wf" uniqKey="Novotny W">WF Novotny</name>
</author>
<author>
<name sortKey="Burchmore, M" uniqKey="Burchmore M">M Burchmore</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lumachi, F" uniqKey="Lumachi F">F Lumachi</name>
</author>
<author>
<name sortKey="Brunello, A" uniqKey="Brunello A">A Brunello</name>
</author>
<author>
<name sortKey="Maruzzo, M" uniqKey="Maruzzo M">M Maruzzo</name>
</author>
<author>
<name sortKey="Basso, U" uniqKey="Basso U">U Basso</name>
</author>
<author>
<name sortKey="Basso, Smm" uniqKey="Basso S">SMM Basso</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Frasor, J" uniqKey="Frasor J">J Frasor</name>
</author>
<author>
<name sortKey="Chang, Ec" uniqKey="Chang E">EC Chang</name>
</author>
<author>
<name sortKey="Komm, B" uniqKey="Komm B">B Komm</name>
</author>
<author>
<name sortKey="Lin, Cy" uniqKey="Lin C">CY Lin</name>
</author>
<author>
<name sortKey="Vega, Vb" uniqKey="Vega V">VB Vega</name>
</author>
<author>
<name sortKey="Liu, Et" uniqKey="Liu E">ET Liu</name>
</author>
<author>
<name sortKey="Miller, Ld" uniqKey="Miller L">LD Miller</name>
</author>
<author>
<name sortKey="Smeds, J" uniqKey="Smeds J">J Smeds</name>
</author>
<author>
<name sortKey="Bergh, J" uniqKey="Bergh J">J Bergh</name>
</author>
<author>
<name sortKey="Katzenellenbogen, Bs" uniqKey="Katzenellenbogen B">BS Katzenellenbogen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Olsson, Pa" uniqKey="Olsson P">PA Olsson</name>
</author>
<author>
<name sortKey="Korhonen, L" uniqKey="Korhonen L">L Korhonen</name>
</author>
<author>
<name sortKey="Mercer, Ea" uniqKey="Mercer E">EA Mercer</name>
</author>
<author>
<name sortKey="Lindholm, D" uniqKey="Lindholm D">D Lindholm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Prudnikova, Ty" uniqKey="Prudnikova T">TY Prudnikova</name>
</author>
<author>
<name sortKey="Mostovich, La" uniqKey="Mostovich L">LA Mostovich</name>
</author>
<author>
<name sortKey="Domanitskaya, Nv" uniqKey="Domanitskaya N">NV Domanitskaya</name>
</author>
<author>
<name sortKey="Pavlova, Tv" uniqKey="Pavlova T">TV Pavlova</name>
</author>
<author>
<name sortKey="Kashuba, Vi" uniqKey="Kashuba V">VI Kashuba</name>
</author>
<author>
<name sortKey="Zabarovsky, Er" uniqKey="Zabarovsky E">ER Zabarovsky</name>
</author>
<author>
<name sortKey="Grigorieva, Ev" uniqKey="Grigorieva E">EV Grigorieva</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Magnon, C" uniqKey="Magnon C">C Magnon</name>
</author>
<author>
<name sortKey="Hall, Sj" uniqKey="Hall S">SJ Hall</name>
</author>
<author>
<name sortKey="Lin, J" uniqKey="Lin J">J Lin</name>
</author>
<author>
<name sortKey="Xue, X" uniqKey="Xue X">X Xue</name>
</author>
<author>
<name sortKey="Gerber, L" uniqKey="Gerber L">L Gerber</name>
</author>
<author>
<name sortKey="Freedland, Sj" uniqKey="Freedland S">SJ Freedland</name>
</author>
<author>
<name sortKey="Frenette, Ps" uniqKey="Frenette P">PS Frenette</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, Kl" uniqKey="Wu K">KL Wu</name>
</author>
<author>
<name sortKey="Yang, Ms" uniqKey="Yang M">MS Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goh, Kl" uniqKey="Goh K">KL Goh</name>
</author>
<author>
<name sortKey="Cusick, Me" uniqKey="Cusick M">ME Cusick</name>
</author>
<author>
<name sortKey="Valle, D" uniqKey="Valle D">D Valle</name>
</author>
<author>
<name sortKey="Childs, B" uniqKey="Childs B">B Childs</name>
</author>
<author>
<name sortKey="Vidal, M" uniqKey="Vidal M">M Vidal</name>
</author>
<author>
<name sortKey="Barabasi, Al" uniqKey="Barabasi A">AL Barabási</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dai, Hj" uniqKey="Dai H">HJ Dai</name>
</author>
<author>
<name sortKey="Chang, Yc" uniqKey="Chang Y">YC Chang</name>
</author>
<author>
<name sortKey="Tsai, Rt" uniqKey="Tsai R">RT Tsai</name>
</author>
<author>
<name sortKey="Hsu, Wl" uniqKey="Hsu W">WL Hsu</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Res Notes</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Res Notes</journal-id>
<journal-title-group>
<journal-title>BMC Research Notes</journal-title>
</journal-title-group>
<issn pub-type="epub">1756-0500</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">27112211</article-id>
<article-id pub-id-type="pmc">4845430</article-id>
<article-id pub-id-type="publisher-id">2023</article-id>
<article-id pub-id-type="doi">10.1186/s13104-016-2023-5</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Jurca</surname>
<given-names>Gabriela</given-names>
</name>
<address>
<email>gajurca@ucalgary.ca</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Addam</surname>
<given-names>Omar</given-names>
</name>
<address>
<email>omaddam@gmail.com</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Aksac</surname>
<given-names>Alper</given-names>
</name>
<address>
<email>aaksa@ucalgary.ca</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Gao</surname>
<given-names>Shang</given-names>
</name>
<address>
<email>shanggao@jlu.edu.cn</email>
</address>
<xref ref-type="aff" rid="Aff2"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Özyer</surname>
<given-names>Tansel</given-names>
</name>
<address>
<email>ozyer@etu.edu.tr</email>
</address>
<xref ref-type="aff" rid="Aff3"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Demetrick</surname>
<given-names>Douglas</given-names>
</name>
<address>
<email>demetric@ucalgary.ca</email>
</address>
<xref ref-type="aff" rid="Aff4"></xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Alhajj</surname>
<given-names>Reda</given-names>
</name>
<address>
<email>alhajj@ucalgary.ca</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
<xref ref-type="aff" rid="Aff5"></xref>
</contrib>
<aff id="Aff1">
<label></label>
Department of Computer Science, University of Calgary, Calgary, AB Canada</aff>
<aff id="Aff2">
<label></label>
College of Computer Science and Technology, Jilin University, Changchun, China</aff>
<aff id="Aff3">
<label></label>
Department of Computer Engineering, TOBB University, Ankara, Turkey</aff>
<aff id="Aff4">
<label></label>
Departments of Pathology, Oncology and Biochemistry & Molecular Biology, University of Calgary, Calgary, AB Canada</aff>
<aff id="Aff5">
<label></label>
Department of Computer Science, Global University, Beirut, Lebanon</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>26</day>
<month>4</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>26</day>
<month>4</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>9</volume>
<elocation-id>236</elocation-id>
<history>
<date date-type="received">
<day>10</day>
<month>9</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>5</day>
<month>4</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>© Jurca et al. 2016</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer.</p>
</sec>
<sec>
<title>Results</title>
<p>We utilized PubMed for the testing. We investigated gene–gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene–gene relations and gene functions.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s13104-016-2023-5) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Breast cancer</kwd>
<kwd>Data mining</kwd>
<kwd>Text mining</kwd>
<kwd>Network analysis</kwd>
</kwd-group>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2016</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Background</title>
<sec id="Sec1235">
<title>Introduction</title>
<p>CANCER is one of the most serious and harmful diseases threatening humanity and may lead to death. Unfortunately there is no discovered robust treatment which leads to guaranteed cure from cancer. Thus, researchers from various domains are still working hard to identify molecules (mainly genes or proteins) which could be handled and targeted as cancer biomarkers. Various methods have been developed. The research spans a wide range of techniques from wet-lab testing by biologists to computational methods by computer scientists. The latter research is promising because it helps in tremendously reducing the number of molecules to consider as potential biomarkers.</p>
<p>Cancer is a result of damage (mutation) to a cell’s DNA (deoxyribonucleic acid), so that the cell loses normal functionality and instead gains the ability to indefinitely multiply until normal tissue functions are impaired [
<xref ref-type="bibr" rid="CR1">1</xref>
]. Cancerous DNA mutations may occur from a complex mixture of inherited and external (environmental) factors, where these mutations are usually located in cell division genes [
<xref ref-type="bibr" rid="CR1">1</xref>
]. There are over 100 known different types of cancer, depending on the cell type which was originally affected [
<xref ref-type="bibr" rid="CR1">1</xref>
]. Additionally, each patient may have a different set of cancerous mutations in various genes, which may lead to different subtypes of the cancer. In order to personalize therapeutic strategies for cancer patients, medical researchers aim to identify and characterize the biomarkers of each type of cancer, so that they can provide the most accurate diagnosis to patients [
<xref ref-type="bibr" rid="CR2">2</xref>
]. A cancer biomarker refers to a substance or process that serves as indication of cancer in the body, where one common example of a cancer biomarker is genetics [
<xref ref-type="bibr" rid="CR3">3</xref>
].</p>
<p>The basic unit of genetic biomarkers are genes. A gene is one unit of the DNA which often contains the information needed to produce proteins. The central dogma is that genes are transcribed into an intermediate molecules called RNA, and the RNA is then translated into proteins, where proteins carry out the basic functions of life [
<xref ref-type="bibr" rid="CR4">4</xref>
]. If a gene codes for a protein whose function is to suppress cancer, then if that gene is damaged or is downregulated (not transcribed enough), then the cell may become cancerous. Similarly, if a gene codes for a protein whose function is to promote cancer, then if that gene is upregulated (transcribed more than usual), then that cell may also become cancerous. Therefore, finding the different genes and conditions which are likely to lead to cancer, should the genes be upregulated or downregulated, is an important task for characterizing types of cancer. The problem is not trivial because there are various internal and external factors that might affect the cells leading to cancer. People do not have the same habits and behavior. Thus they may develop the same cancer differently based on the environment they live in, their diet, drinking, etc. Also, some types of cancer, such as breast and prostate cancer can be strongly influenced by inherited gene mutations, and often run in families [
<xref ref-type="bibr" rid="CR5">5</xref>
]. Therefore, these heritable types of cancer may be predicted by examining a person’s DNA before they develop cancer. Identifying the heritable genetic mutations that increase the likelihood for cancer are critical to developing predictive genetic tests.</p>
<p>Our framework described in this paper is built on the hypothesis which could be articulated as follows. To investigate cancer biomarkers, one may investigate the literature which contains a huge amount of information hidden in the form of scientific articles. However, a query for “breast cancer” to PubMed can retrieve over 250,000 articles, which makes it impossible to get a full-picture of the field by reading them. The trend is that the number of PubMed articles are steadily increasing, and so are articles on the topic of breast cancer that mention gene names argued as potential biomarkers (see Fig. 
<xref rid="Fig1" ref-type="fig">1</xref>
). Therefore, using text mining techniques to gather new knowledge from many existing scientific sources can be an effective way to investigate the literature for new biomarkers. One type of relationship which can be discovered is gene-disease, that shows which gene is involved in which disease [
<xref ref-type="bibr" rid="CR6">6</xref>
]. Another type of relationship which can be found are gene–gene interactions [
<xref ref-type="bibr" rid="CR7">7</xref>
].
<fig id="Fig1">
<label>Fig. 1</label>
<caption>
<p>Growth in number of abstracts about breast cancer in PubMed</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
<p>Some data mining techniques that can be used to extract hidden information from a database are hard clustering, soft clustering, hierarchical clustering, and frequent pattern mining [
<xref ref-type="bibr" rid="CR8">8</xref>
]. All of the aforementioned techniques are described in more detail in “
<xref rid="Sec19" ref-type="sec">Results and discussion</xref>
” section. Each data mining technique utilizes different interestingness metrics, so it is useful to apply many techniques to a data set. Another technique we used on the genes extracted from the breast cancer abstracts was network analysis, or “Social Network Analysis” as it is sometimes referred to [
<xref ref-type="bibr" rid="CR9">9</xref>
]. Network analysis has its roots in sociology, as it was first used to study the relationships and community structures in social data. However, network analysis has since been applied in other fields such as bioinformatics in order to find key molecular markers and communities within an interaction network.</p>
<p>To validate genes linked to cancer, one of the most effective ways is to analyze disease specific gene expression data [
<xref ref-type="bibr" rid="CR10">10</xref>
].</p>
<p>Gene expression data is experimental data which can be used to check whether a gene has indeed been upregulated or downregulated with respect to a disease. This methodology compares to what level genes were expressed in cancerous cells versus healthy cells. It is unaffordable and infeasible to try wet-lab analysis of such a huge set of genes. Therefore, machine learning and data mining techniques (including frequent pattern mining, clustering and classification) can be used to lower this number of genes down to a manageable set of genes which are anticipated to be statistically linked with the disease. This way, biologists will concentrate only on the identified small set as potential cancer biomarkers instead of unrealistic case of testing every gene in the wet-lab as potential cancer biomarker. In other words, data mining techniques can save the time and cost of cancer researchers, turning their research goals into something potentially achievable. This is illustrated by the test results reported in this paper.</p>
<p>The paper is organized as the following sections. The problem explanation is made in “
<xref rid="Sec5456" ref-type="sec">Problem explanation</xref>
” section. “
<xref rid="Sec4567" ref-type="sec">Related work</xref>
” section describes the work related to our solution. In “
<xref rid="Sec1265" ref-type="sec">The developed solution</xref>
” section, the developed methodology is given in detail. The experimental results are depicted in “
<xref rid="Sec123456" ref-type="sec">Evaluation of the developed solution</xref>
” section. Lastly, contributions and future work are mentioned in “
<xref rid="Sec19" ref-type="sec">Results and discussion</xref>
” section.</p>
</sec>
<sec id="Sec5456">
<title>Problem explanation</title>
<p>Identifying cancer biomarkers is not a trivial task. Despite all the effort, time, and money invested so far, the progress is still very little. Indeed the body is affected by various internal and external factors which altogether may lead to cancer. As the factors differ from person to person, the samples taken from two cancer patients may not reveal exactly the same information. Thus, there is a need to develop new techniques which could better analyze the existing sources of data with the hope to lead to more useful discoveries.</p>
<p>In this paper, we aimed to perform large-scale text analysis of biomedical abstracts in order to generate new hypothesis about cancer biomarkers. The target was to develop a data mining methodology, which would lead to patterns in the genes which are associated with cancer. In the this section we will discuss the tasks involved in text mining.</p>
<sec id="Sec789">
<title>Text mining</title>
<p>Text mining is typically comprised of four stages [
<xref ref-type="bibr" rid="CR11">11</xref>
,
<xref ref-type="bibr" rid="CR12">12</xref>
]: (1) information retrieval (IR), where a set of textual materials are gathered for a given topic; (2) entity recognition (NER), where textual features are identified from the gathered texts; (3) information extraction (IE) which aims to extract relationships among the recognized textual features; (4) knowledge discovery (KD), where the extracted relationships are used to identify useful patterns from the data set. The rest of this section is dedicated to explain each stage and how they can be applied to biomedical text mining.</p>
<sec id="Sec126">
<title>Information retrieval for text mining</title>
<p>The first step in text mining is to gather the papers which are relevant to the topic of interest. There are a number of IR systems, including centralized institutional like PubMed and UK PubMed Central (UKPMC), or commercial systems like google scholar. The best known one is PubMed [
<xref ref-type="bibr" rid="CR11">11</xref>
<xref ref-type="bibr" rid="CR13">13</xref>
], which searches the MedLine database.</p>
<p>First, we can categorize an IR engine by the input. The topic may come from a query provided by the user, and this method of defining the topic is called ad hoc [
<xref ref-type="bibr" rid="CR14">14</xref>
]. The other kind of IR system is called text categorization, where the input is a set of papers. Ad hoc has some limitations compared to text categorization [
<xref ref-type="bibr" rid="CR13">13</xref>
]. PubMed is an ad-hoc system. Second, we can also categorize IR engines in terms of the scope of content delivered. For example, PubMed produces a comprehensive search of articles, but only retrieves the abstracts of the articles. In contrast, UKPMC returns the full text of articles [
<xref ref-type="bibr" rid="CR13">13</xref>
].</p>
</sec>
<sec id="Sec456">
<title>Entity recognition (NER)</title>
<p>Once we have a subset of the available scientific literature which pertains to our topic, we must identify terms which are relevant to our study. NER has the aim of identifying terms within the gathered text, such as the names of different proteins or genes. The first task of these systems is to identify the biological entity names. The second task of NER is to identify the unique entity names. However, identifying biological terms is challenging due to the following reasons [
<xref ref-type="bibr" rid="CR12">12</xref>
]:
<list list-type="bullet">
<list-item>
<p>Biomedical terms often have synonyms (e.g., PTEN and MMAC1 refer to the same gene).</p>
</list-item>
<list-item>
<p>A term may have different meanings (e.g., Cancer can also mean the astrological sign).</p>
</list-item>
<list-item>
<p>Acronyms may lead to ambiguities (e.g., BC may mean breast cancer or it may mean British Columbia).</p>
</list-item>
</list>
</p>
<p>These challenges can make the naming of the biological entities quite imprecise. However, some strategies to overcome these drawbacks have been implemented in NER systems. One method is to integrate different vocabularies and ontologies which hold complete lists of biological entity names [
<xref ref-type="bibr" rid="CR12">12</xref>
]. For example, gene ontology is a classification effort to describe what we know about genes, including to develop controlled vocabularies about those genes.</p>
<p>Early NER systems were rule-based with manually designed rules based on word structures. More recently NER systems have shifted to machine learning techniques which can recognize characteristics of words. A third type of NER systems is dictionary-based, which is the most effective due to the fact that it can recognize synonyms. In addition, it is also possible to use algorithms which can disambiguate acronyms automatically [
<xref ref-type="bibr" rid="CR11">11</xref>
]. Some examples of NER systems that recognize biomedical entities are NCBO annotator, cTAKES, MetaMap, and BeCAS. A study which compared these four systems using their own ground truth determined that BeCAS performed differently compared to the other three systems [
<xref ref-type="bibr" rid="CR15">15</xref>
]. BeCAS performed more poorly overall, but BeCAS recognized larger sentences than the other systems, which may have been underrepresented in their evaluation [
<xref ref-type="bibr" rid="CR15">15</xref>
].</p>
<p>Figure 
<xref rid="Fig2" ref-type="fig">2</xref>
shows how an NER system may annotate biomedical terms. For example, in our problem, we require the genes associated with breast cancer. Therefore, we may use BeCAS to first find biomedical terms, then to label proteins and genes, followed by verification with the UniProt database using the given UniPROT ID. UniProt is a database which stores genes and proteins information.
<fig id="Fig2">
<label>Fig. 2</label>
<caption>
<p>Text annotation using BeCAS</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
</sec>
<sec id="Sec45978">
<title>Information extraction (IE)</title>
<p>The aim of IE is to extract relationships between the biological entities mentioned in the text. There are two approaches for this: co-occurrence processing and natural language processing (NLP) [
<xref ref-type="bibr" rid="CR11">11</xref>
]. In co-occurrence processing, the entities are deemed to be related if they occur in the same text. For example, the relationships found are usually of the type gene–gene, or gene-disease. However, in co-occurrence processing, one cannot extract directional relationships between entities.</p>
<p>Through NLP, the directionality of the relationship between the biological entities can also be found. NLP analyzes the syntax and semantics of the sentence which contains the entities. However, NLP is better suited for full-text mining rather than abstract mining. The concise nature of abstracts makes it difficult to analyze the context of the biological entities [
<xref ref-type="bibr" rid="CR14">14</xref>
]. Also, due to their complexity, NLP systems are designed for limited and specific types of relationships, and only a few systems can recognize multiple types of relationships [
<xref ref-type="bibr" rid="CR14">14</xref>
]. As further discussed in “
<xref rid="Sec1265" ref-type="sec">The developed solution</xref>
” section, we used BeCAS API [
<xref ref-type="bibr" rid="CR16">16</xref>
] to annotate and extract co-occurrences of biomedical concepts such as gene, protein, etc.</p>
</sec>
<sec id="Sec5789">
<title>Knowledge discovery (KD)</title>
<p>KD is the extraction of knowledge from a large volume of structured and/or unstructured data. The goal of KD is to uncover novel knowledge from existing data. Novel data can be in the form of hidden relationships among biological entities. For example, if
<italic>A</italic>
is related to
<italic>B</italic>
, and
<italic>B</italic>
is related to
<italic>C</italic>
, text mining can infer the relationship that
<italic>A</italic>
is related to
<italic>C</italic>
. It is difficult for people to discover indirect relationships from a large amount of data. KD is often used to gain biologically meaningful knowledge about how biological entities are related.</p>
</sec>
</sec>
<sec id="Sec54">
<title>Hypothesis generation</title>
<p>One of the newer approaches described in the literature is to generate scientific hypotheses through text mining [
<xref ref-type="bibr" rid="CR11">11</xref>
,
<xref ref-type="bibr" rid="CR13">13</xref>
]. KD can be used to generate scientific hypotheses, for example about relationships between entities, which have yet to be validated. Whereas KD attempts to discover biological meaning about a set of facts, hypothesis generation attempts to discover whole new relationships. Hypothesis generation can be useful at directing scientists to which genes they should study without wasting much resources on the exploration.</p>
<p>The work described in [
<xref ref-type="bibr" rid="CR11">11</xref>
] describes two ways in which hypothesis generation can occur: one way is to start with the microarray data to identify genes hypotheses, and then to support these hypotheses with literature mining. The second is to generate hypotheses through literature mining, and then validate the hypotheses through experimental data, such as microarray data. We decided to investigate the second method of hypothesis generation; actually, Faro et al. [
<xref ref-type="bibr" rid="CR11">11</xref>
] identified the field as more lacking in research.</p>
</sec>
<sec id="Sec55">
<title>Evaluation</title>
<p>Some related work that use biomedical text mining to generate hypotheses have evaluated their results with experimental data [
<xref ref-type="bibr" rid="CR11">11</xref>
,
<xref ref-type="bibr" rid="CR17">17</xref>
]. Experimental data can consist of gene expression data, which often comes in the form of microarray data. Gene microarray experiments are performed using specific tissue samples, and they measure the presence of the intermediate molecule RNA, so that we can know which genes are important in particular conditions [
<xref ref-type="bibr" rid="CR18">18</xref>
]. Some genes may be up- regulated, which means that they were transcribed more, and we say that these genes were ‘expressed’. Otherwise, the genes may be down-regulated, which means that the genes were not ‘expressed’. Genes that were expressed together at the same time may have a relationship together, and we say they are ‘co-expressed’.</p>
<p>There are publicly available online repositories that store experimental data, as well as the gene–gene relationships and gene functionalities derived from experimental data. Some tools such as GeneMania have been built that show the relationships between genes by integrating information from various databases [
<xref ref-type="bibr" rid="CR19">19</xref>
]. Tools such as GeneMania may be useful for validating the gene–gene relationship hypotheses. There are also tools such as DisGeNet [
<xref ref-type="bibr" rid="CR20">20</xref>
,
<xref ref-type="bibr" rid="CR21">21</xref>
] and FunDo [
<xref ref-type="bibr" rid="CR22">22</xref>
] that identify gene-disease relationships from curated sources.</p>
</sec>
</sec>
<sec id="Sec4567">
<title>Related work</title>
<p>Faro et al. [
<xref ref-type="bibr" rid="CR11">11</xref>
] described the methodology of hypothesis generation from literature, combined with experimental data evaluation, to be quite novel in 2011. In this section, we will describe some of the tools and methodologies which have been used for hypothesis generation from biomedical literature.</p>
<p>GeneWizard is an application which allows users to generate biological hypotheses based on text mining, and then evaluate the hypotheses through gene expression data [
<xref ref-type="bibr" rid="CR17">17</xref>
]. One advantage of this tool is that it can be used to generate hypotheses about genes of any disease, whereas our methodology has so far been focused on breast cancer. However, in the future we aim to try our methodology on other cancer or diseases as well.</p>
<p>For the IR step, GeneWizard also used PubMed to retrieve articles related to the disease of interest, just as we did in our methodology. For NER, GeneWizard recognizes the biological entities related to a disease by using dictionaries created for the disease and for the genes. To identify relationships between genes, GeneWizard performs clustering of the abstracts, based on similarity matrices constructed from abstracts, based on the frequencies of the disease and gene terms.</p>
<p>Another goal of GeneWizard is to be highly usable, so that not much experience with text mining methods is required of the users. Faro et al. [
<xref ref-type="bibr" rid="CR11">11</xref>
] stress that it is important for tools that generate biological hypotheses to have a high usability, since the audience who use these tools are likely to be biologists, not computer scientists.</p>
<p>Another tool is called BioWizard, which is very similar to GeneWizard, yet it performs full-text analysis instead of abstract analysis [
<xref ref-type="bibr" rid="CR23">23</xref>
]. Also, BioWizard was tested against gold standard gene-disease relationships in order to check the precision of the recall, in addition to experimental data in the form of microarray data. This system was then moved to the cloud in order to perform more intensive computations in a shorter amount of time [
<xref ref-type="bibr" rid="CR24">24</xref>
].</p>
<p>Another study which generated hypotheses from literature performed the IE step by splitting the abstracts into sentences and considered the sentences which contained an interaction plus two gene names [
<xref ref-type="bibr" rid="CR25">25</xref>
]. A network of genes was built from the extracted genes and interactions. The genes which ranked the highest in centrality measures were manually validated by looking through literature. A similar study was done by [
<xref ref-type="bibr" rid="CR6">6</xref>
], and high accuracy was achieved for finding actual gene-disease relationships in prostate cancer. Interestingly, even genes which were missed later turned out to have an article written about how they were indeed involved in prostate cancer [
<xref ref-type="bibr" rid="CR6">6</xref>
].</p>
<p>Our contribution is that we will use different data mining techniques and various APIs for the different stages of the text mining, and that we will investigate relationships such as gene-country, gene-year, and abstract-country which have not been investigated by other papers so far. We explored how these new types of relationships can help to generate hypotheses about which genes should be studied.</p>
</sec>
</sec>
<sec id="Sec1234">
<title>Methods</title>
<sec id="Sec1265">
<title>The developed solution</title>
<sec id="Sec786">
<title>Overview</title>
<p>Figure 
<xref rid="Fig3" ref-type="fig">3</xref>
illustrates the steps of the methodology. Our goal is to contribute novel ideas for KD and hypothesis generation related to genes involved in breast cancer. We decided to use ready-API’s for IR, NER, and IE parts of the developed framework. The first step in our solution was the IR step, where our goal was to retrieve all relevant papers related to our topic of interest: breast cancer.
<fig id="Fig3">
<label>Fig. 3</label>
<caption>
<p>Outline of the workflow and resources used in the proposed solution</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig3_HTML" id="MO3"></graphic>
</fig>
</p>
<p>Although full-text analysis contains more information than abstracts [
<xref ref-type="bibr" rid="CR11">11</xref>
,
<xref ref-type="bibr" rid="CR12">12</xref>
,
<xref ref-type="bibr" rid="CR14">14</xref>
], we chose to examine abstracts because they contain the most important and concise keywords. Also, due to their shorter length, their analysis would be much faster to compute, so this would enable us to do a larger scale text analysis. Moreover, we speculated that full-texts may contain references to other genes which are not necessarily related to breast cancer, or genes that may be relevant to other cancer, which may add to the noise. In other words, although full-text mining may produce a higher recall, abstract based text mining may produce a higher precision. Therefore, our first step was to retrieve as many biomedical abstracts related to breast cancer as possible. All of the abstracts which we used for the analysis were retrieved using the PubMed API to the MedLine database. We chose to use PubMed because it is the most well-known search engine for biomedical papers [
<xref ref-type="bibr" rid="CR11">11</xref>
,
<xref ref-type="bibr" rid="CR12">12</xref>
,
<xref ref-type="bibr" rid="CR14">14</xref>
]. The search keywords that we used were “breast cancer”. The total amount of abstracts which were retrieved from PubMed was 289,510 in the month of October, 2014. We then filtered the papers so that the remaining subset of 225,059 that had an abstract, title, authors, and a journal name. Of the paper set that was excluded, 62,752 papers did not have an abstract and 257 did not have a date.</p>
<p>The PubMed API also provided extra information about the articles, such as keywords, title, abstract, authors, affiliation of authors, publishing date, and journal name. In addition to the abstracts, it was useful to receive most of the extraneous data in a standardized format, because we could use it to perform additional analysis on breast cancer data. However, not all of the data was clean and therefore they required more processing, such as author affiliation. We will later discuss how we processed author affiliation in order to use it for the analysis. In the next step, we recognized the named entities in the abstracts and titles. We used an online API called BeCAS, which identifies biomedical concepts in text [
<xref ref-type="bibr" rid="CR16">16</xref>
]. In our opinion, BeCAS is a well-documented API; it performs well enough at identifying biomedical terms. Further, another important reason for using BeCAS was because it is integrated with PubMed such that it requires only the PubMed ID of the abstract in order to perform the analysis. Thus, we did not need to upload the abstract itself into BeCAS. This saved computational memory and time.</p>
<p>The named entities we were interested in are genes and proteins. Since we wanted to consider only genes for our analysis, we collected the genes from the text, but we also collected genes which were associated with proteins that were mentioned in the text. Another reason for using BeCAS is because it is well-integrated with the UniProt database [
<xref ref-type="bibr" rid="CR26">26</xref>
] which stores genes and proteins information. For each protein and gene, BeCAS provided the UniProt ID in order to verify the entity. The UniProt ID also allowed us to retrieve genes which were associated with the proteins mentioned in the text. UniProt also helps to address one of the biggest challenges in biomedical text mining, i.e., genes may contain many synonyms. UniProt stores known synonyms for each gene name. This helps to reduce the number of duplicate genes listed within the abstracts under alternative names. After recognizing genes within the abstracts as well as those associated with the proteins mentioned in the abstracts, we filtered the paper set to include only abstracts which contain genes. Therefore, our final paper-set used in the analysis was reduced to 117,339 papers. The abstracts which were excluded following the NER step may be related to other aspects of breast cancer, possibly from a health care or psychological perspective, not the genetic side which we are interested in.</p>
<p>The next step was to generate hypotheses about the relationships between genes, and also between genes and other information associated with them, such as the abstracts and authors. The relationships between the genes were measured as co-occurrences within the abstracts, and the semantic relations or directionality between the genes were not extracted to be used in the analysis. Although many hypothesis-generating methodologies use gene–gene relationships to generate hypotheses about which genes should be investigated, our methodology uses additional information, such as the authors, locations, and dates. Therefore, we developed a methodology to create hypotheses that stem from different types of information that is typically used by other researchers.</p>
<p>One of the features that we examined was the country of an author’s affiliation. By extracting the country of an author’s affiliation, we then related the countries which published breast cancer papers to the genes. Interesting correlations were then found, such as the genes that particular countries focused on. Researchers might use the gene-country information to see which genes are hot topics to study in a country. Another feature that we considered was the year that the abstract was published in. The gene-year relationship allowed us to find which genes were frequently mentioned together every year, which might lead a researcher to believe that these genes might have a hidden connection that needs to be further explored in the wet-lab. A third relationship that we explored was gene–gene co-occurrence frequency within the abstracts. An ideal analysis technique to explore the gene–gene relationships was network analysis, as the genes could be the “actors” and the number of abstract co-occurences could be the “action” between two genes. The network analysis technique is further discussed in “
<xref rid="Sec19" ref-type="sec">Results and discussion</xref>
” section. Lastly, we also examined how many abstracts each country published in order to find which countries are the top contributors to breast cancer research.</p>
<p>For the data mining analysis, we used the software KNIME.
<xref ref-type="fn" rid="Fn1">1</xref>
For the social network analysis, we used Gephi.
<xref ref-type="fn" rid="Fn2">2</xref>
The web tools that we used to evaluate some of our results were GeneMania, DisGeNET, and FunDO. The computer used for the analysis has the following main specifications: Intel i5-4570 CPU, 8gb RAM, Windows 10 OS.</p>
</sec>
<sec id="Sec6556">
<title>Country identification</title>
<p>To find countries associated with each retrieved article, we needed to process the string which contains the affiliation(s) of authors, called the position (Fig. 
<xref rid="Fig4" ref-type="fig">4</xref>
). The extra processing was required because the position often contained extraneous information, such as the names of the institution(s) and the author’s e-mail. The number of authors was around 500,000, but after we grouped them by first name, last name, and affiliation, the number rose to 601,287, most likely due to authors changing institutions throughout their careers or having popular names referring to different authors at different institutions, e.g., ‘Ken Barker’ is a popular name who exists at three institutions. There were 193,000 different possible affiliations for the authors who published abstracts with genes mentioned in them. Many authors contained multiple institutions in their affiliations.
<fig id="Fig4">
<label>Fig. 4</label>
<caption>
<p>The extraction of the country</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig4_HTML" id="MO4"></graphic>
</fig>
</p>
<p>For each of the affiliations, we then wanted to find the associated country name. We used google maps API
<xref ref-type="fn" rid="Fn3">3</xref>
to retrieve the country name. We split the string into sub addresses using the comma delimiter. Each search was performed using the rightmost delimited address, which often contained the country name. However, when the sub address string was insufficient to achieve exactly one country name, we repeatedly increased the size of the string with the next rightmost element of the sub address. As seen in Fig. 
<xref rid="Fig4" ref-type="fig">4</xref>
, we first made a query using sub address 1, and if that did not return precise enough results to reveal the country of origin, then we made another query which also included sub address 2, etc. The final set excluded all of the institutions inside each affiliation which did not contain a valid address, which was about 1 %. One limitation of google maps API is that it had a daily quota of queries which could be submitted to the service. With our large number of institutions, we needed to optimize the number of online queries. We achieved this by constructing a cache system which stored all special keywords existing in the affiliations; this helped us to distinguish the institutions directly. Using the cache system, we submitted only 8558 queries to google maps API. Altogether, we found that there were 159 countries with articles published under “breast cancer” category and contain genes.</p>
</sec>
</sec>
</sec>
<sec id="Sec1236">
<title>Results</title>
<sec id="Sec123456">
<title>Evaluation of the developed solution</title>
<sec id="Sec898">
<title>Overview</title>
<p>Our gene–gene results were evaluated by comparison to results retrieved through a web tool called GeneMania
<xref ref-type="fn" rid="Fn4">4</xref>
which uses publicly available curated and experimental data to derive gene–gene relationships [
<xref ref-type="bibr" rid="CR19">19</xref>
]. GeneMania also shows predicted relationships [
<xref ref-type="bibr" rid="CR19">19</xref>
]. If most of the relationships that we hypothesize are also reported by GeneMania, then our hypothesis would be strengthened. Any gene–gene relationships that are missing in the GeneMania results have the potential to be newly discovered relationships that may warrant more investigation by wet-lab researchers.</p>
<p>Our gene-disease results were evaluated by comparing our results to DisGeNet and FunDo, which are two web tools that identify gene-disease relationships.</p>
</sec>
<sec id="Sec465">
<title>Resources used</title>
<sec id="Sec656">
<title>Evaluation of gene clusters and communities</title>
<p>For the evaluation of our results, we used GeneMania in order to link our text-mining results to results drawn from experimental data [
<xref ref-type="bibr" rid="CR19">19</xref>
]. GeneMania accounts for a few different types of interactions between genes, such as co-expression, physical interaction, genetic interaction, shared protein domains, co-localization, pathway, as well as predicted relationships using orthological functional data from other organisms. For all of our evaluations, we used datasets that described human genes.</p>
<p>Co-expressed genes are genes which had the same expression levels over the same conditions in a published study, where most of the gene expression data came from the gene expression omnibus (GEO) database. Another interaction in GeneMania is physical interaction, which means if two genes code for proteins that have a physical interaction, then the two genes have a connection. These protein–protein interactions were pulled from BioGRID
<xref ref-type="fn" rid="Fn5">5</xref>
and pathwaycommons databases, which store protein–protein interactions. The other interactions we considered from GeneMania were shared protein domains, Co-localization, and pathway interactions. Two genes partake in the shared protein domain interaction if their proteins have the same protein domain. Two genes have co-localization interaction if their proteins are found in the same body tissue. Finally, two genes share in the pathway interaction if they participate in the same reaction in a pathway. The sources of data that GeneMania uses are listed in the highly cited published paper [
<xref ref-type="bibr" rid="CR19">19</xref>
].</p>
</sec>
<sec id="Sec546">
<title>Disease identification</title>
<p>To find the disease which was most associated to each gene, we used the DisGeNET
<xref ref-type="fn" rid="Fn6">6</xref>
API [
<xref ref-type="bibr" rid="CR20">20</xref>
,
<xref ref-type="bibr" rid="CR21">21</xref>
]. DisGeNet finds gene-disease relationships, from either curated sources, literature based associations, or predicted associations. For our study, we were interested only in human gene-disease relationships, so therefore we only used the curated sources. The curated sources for DisGeNET include human gene-disease relationships from the comparitive toxigenomics database (CTD) and UniProt. We used DisGeNET to find the gene-disease associations for the genes found through the gene-year and gene-country clustering (
<xref rid="Sec34" ref-type="sec">Appendix</xref>
: Tables 
<xref rid="Tab9" ref-type="table">9</xref>
,
<xref rid="Tab10" ref-type="table">10</xref>
). The diseases were identified on a gene by gene basis.</p>
<p>For the social network analysis, we used FunDO
<xref ref-type="fn" rid="Fn7">7</xref>
to identify the diseases which were common between large groups of genes [
<xref ref-type="bibr" rid="CR22">22</xref>
]. FunDO takes a list of genes and retrieves the related diseases, based on the disease ontology database. The reason that we used FunDO instead of DisGeNET for analyzing the gene communities, is that FunDO provides a better analysis for common diseases between a group of genes. DisGeNET provides exclusive lists of diseases for each gene, whereas FunDO provides a list of shared diseases among the genes. An automated identification of diseases shared among groups of genes was beneficial, because the smallest community we obtained had 229 genes in community 1 (
<xref rid="Sec34" ref-type="sec">Appendix</xref>
: Table 
<xref rid="Tab8" ref-type="table">8</xref>
). For each community from the social network analysis, we retrieved the top five diseases within the community.</p>
</sec>
</sec>
</sec>
</sec>
<sec id="Sec4123">
<title>Discussion</title>
<sec id="Sec19">
<title>Results and discussion</title>
<sec id="Sec89">
<title>Hard clustering</title>
<p>Clustering is the process of grouping items together into “clusters”, so that the items within each cluster have more similarity to each other than to items in other clusters. Hard clustering separates items into distinct groups, where each item belongs to exactly one cluster. We performed hard clustering on genes with respect to the country affiliation of the authors who published papers on the genes. In this section, we present our results and some of the interesting genes a researcher might find to study from the results.</p>
<sec id="Sec4756">
<title>Which countries have studied the largest number of breast cancer genes?</title>
<p>In Table 
<xref rid="Tab1" ref-type="table">1</xref>
, the country which published the largest number of articles on the topic of breast cancer is the United States; authors affiliated with the United States also published the largest number of articles which mention breast cancer genes. In Fig. 
<xref rid="Fig5" ref-type="fig">5</xref>
, the genes were clustered by colour of the countries that published the most amount of papers on those genes. Figure 
<xref rid="Fig5" ref-type="fig">5</xref>
shows that the United States has studied the largest number of genes by far, since most of genes have been mentioned by abstracts affiliated with the United States. Countries which ranked second and third are China and United Kingdom respectively. The United States, United Kingdom, and China seem to have the largest support for breast cancer research and are leading the research worldwide.
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>The number of gene mentions</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">All Abstracts</th>
<th align="left"></th>
<th align="left">Abstracts with gene mentions</th>
<th align="left"></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">United States</td>
<td char="." align="char">62,013</td>
<td align="left">United States</td>
<td char="." align="char">33,373</td>
</tr>
<tr>
<td align="left">United Kingdom</td>
<td char="." align="char">11652</td>
<td align="left">China</td>
<td char="." align="char">6553</td>
</tr>
<tr>
<td align="left">China</td>
<td char="." align="char">8858</td>
<td align="left">United Kingdom</td>
<td char="." align="char">6041</td>
</tr>
<tr>
<td align="left">Japan</td>
<td char="." align="char">8807</td>
<td align="left">Japan</td>
<td char="." align="char">5299</td>
</tr>
<tr>
<td align="left">Italy</td>
<td char="." align="char">8667</td>
<td align="left">Italy</td>
<td char="." align="char">4621</td>
</tr>
<tr>
<td align="left">Germany</td>
<td char="." align="char">7394</td>
<td align="left">Germany</td>
<td char="." align="char">4148</td>
</tr>
<tr>
<td align="left">France</td>
<td char="." align="char">6757</td>
<td align="left">France</td>
<td char="." align="char">3642</td>
</tr>
<tr>
<td align="left">Canada</td>
<td char="." align="char">6476</td>
<td align="left">Canada</td>
<td char="." align="char">3573</td>
</tr>
<tr>
<td align="left">The Netherlands</td>
<td char="." align="char">4071</td>
<td align="left">South Korea</td>
<td char="." align="char">2144</td>
</tr>
<tr>
<td align="left">Australia</td>
<td char="." align="char">3601</td>
<td align="left">The Netherlands</td>
<td char="." align="char">1844</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab2">
<label>Table 2</label>
<caption>
<p>Represented 5 highest maximal closed frequent item sets for Gene-Country</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Gene maximal closed frequent item set</th>
<th align="left">Support</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">ERBB2, ESR1, PGR</td>
<td char="." align="char">48.43</td>
</tr>
<tr>
<td align="left">EGF, ERBB2, ESR1</td>
<td char="." align="char">46.54</td>
</tr>
<tr>
<td align="left">BRCA1, ERBB2, ESR1</td>
<td char="." align="char">45.91</td>
</tr>
<tr>
<td align="left">BRCA1, BRCA2</td>
<td char="." align="char">45.28</td>
</tr>
<tr>
<td align="left">CDKN2A, ESR1</td>
<td char="." align="char">45.28</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab3">
<label>Table 3</label>
<caption>
<p>Represented 5 highest maximal closed frequent item sets for Gene-Year</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Gene maximal closed frequent item set</th>
<th align="left">Support</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">CEACAM3, ESR1</td>
<td char="." align="char">82.69</td>
</tr>
<tr>
<td align="left">ALPPL2, CD99, CEACAM3, CHI3L1, ESR1, MUC21, SOD1</td>
<td char="." align="char">78.85</td>
</tr>
<tr>
<td align="left">AMN, CD40LG, CD79A, CEACAM3, ESR1, PRL</td>
<td char="." align="char">78.85</td>
</tr>
<tr>
<td align="left">AFP, CEACAM3, ESR1</td>
<td char="." align="char">76.92</td>
</tr>
<tr>
<td align="left">CD99, DHPS, POMC</td>
<td char="." align="char">76.92</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab4">
<label>Table 4</label>
<caption>
<p>Top 10 diseases associated with genes derived from the union of the top 5 gene-year and gene-country itemsets</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Disease name</th>
<th align="left">Genes</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Breast neoplasms</td>
<td align="left">ERBB2, ESR1, PGR, EGF, BRCA1, BRCA2, CD99, AFP</td>
</tr>
<tr>
<td align="left">Adenocarcinoma</td>
<td align="left">ERBB2, PGR, EGF, CDKN2A, CD99</td>
</tr>
<tr>
<td align="left">Mammary neoplasms, experimental</td>
<td align="left">ERBB2, PGR, BRCA1, AFP</td>
</tr>
<tr>
<td align="left">Carcinoma</td>
<td align="left">ESR1, PGR, BRCA1, CD99</td>
</tr>
<tr>
<td align="left">Prostatic neoplasms</td>
<td align="left">ERBB2, EGF, BRCA1, BRCA2</td>
</tr>
<tr>
<td align="left">malignant neoplasm breast</td>
<td align="left">PGR, BRCA1, BRCA2</td>
</tr>
<tr>
<td align="left">Glioma</td>
<td align="left">ERBB2, CDKN2A, CHI3L1</td>
</tr>
<tr>
<td align="left">Hypertension</td>
<td align="left">CHI3L1, SOD1, POMC</td>
</tr>
<tr>
<td align="left">Neoplasm</td>
<td align="left">BRCA1, CDKN2A, CD99</td>
</tr>
<tr>
<td align="left">Ovarian neoplasms</td>
<td align="left">ERBB2, BRCA1, BRCA2</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab5">
<label>Table 5</label>
<caption>
<p>Statistical information for gene–gene network</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left">Nodes</th>
<th align="left">%</th>
<th align="left">Edges</th>
<th align="left">%</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Full network</td>
<td align="left">8400</td>
<td align="left">100</td>
<td align="left">213,894</td>
<td align="left">100</td>
</tr>
<tr>
<td align="left">Giant component</td>
<td align="left">7620</td>
<td align="left">90.71</td>
<td align="left">213,877</td>
<td align="left">99.99</td>
</tr>
<tr>
<td align="left">Pruned giant component</td>
<td align="left">1089</td>
<td align="left">12.96</td>
<td align="left">6815</td>
<td align="left">3.19</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab6">
<label>Table 6</label>
<caption>
<p>Network Analysis measurements for the gene–gene network</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left">Betweenness centrality</th>
<th align="left">Modularity class</th>
<th align="left"></th>
<th align="left">Closeness centrality</th>
<th align="left">Modularity class</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">ESR1</td>
<td align="left">0.09</td>
<td align="left">2</td>
<td align="left">ESR1</td>
<td align="left">0.62</td>
<td align="left">2</td>
</tr>
<tr>
<td align="left">ERBB2</td>
<td align="left">0.06</td>
<td align="left">2</td>
<td align="left">ERBB2</td>
<td align="left">0.6</td>
<td align="left">2</td>
</tr>
<tr>
<td align="left">CDKN2A</td>
<td align="left">0.04</td>
<td align="left">6</td>
<td align="left">CDKN2A</td>
<td align="left">0.58</td>
<td align="left">6</td>
</tr>
<tr>
<td align="left">SLC20A2</td>
<td align="left">0.03</td>
<td align="left">2</td>
<td align="left">SLC20A2</td>
<td align="left">0.57</td>
<td align="left">2</td>
</tr>
<tr>
<td align="left">EGF</td>
<td align="left">0.02</td>
<td align="left">2</td>
<td align="left">EGF</td>
<td align="left">0.57</td>
<td align="left">2</td>
</tr>
<tr>
<td align="left">PGR</td>
<td align="left">0.02</td>
<td align="left">2</td>
<td align="left">PGR</td>
<td align="left">0.56</td>
<td align="left">2</td>
</tr>
<tr>
<td align="left">BRCA1</td>
<td align="left">0.02</td>
<td align="left">6</td>
<td align="left">ACAD9</td>
<td align="left">0.55</td>
<td align="left">5</td>
</tr>
<tr>
<td align="left">CDH1</td>
<td align="left">0.02</td>
<td align="left">0</td>
<td align="left">CDH1</td>
<td align="left">0.55</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left">ACAD9</td>
<td align="left">0.02</td>
<td align="left">5</td>
<td align="left">MAPK10</td>
<td align="left">0.55</td>
<td align="left">5</td>
</tr>
<tr>
<td align="left">HLA-H</td>
<td align="left">0.02</td>
<td align="left">6</td>
<td align="left">TKT</td>
<td align="left">0.55</td>
<td align="left">2</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The top 10 genes with the highest betweenness are shown, as well as the top 10 genes with the highest closeness. The modularity class is also shown, where it denotes the community that the gene belongs to</p>
</table-wrap-foot>
</table-wrap>
<table-wrap id="Tab7">
<label>Table 7</label>
<caption>
<p>Common diseases in each community</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left">Cancer</th>
<th align="left">Breast cancer</th>
<th align="left">Prostate cancer</th>
<th align="left">Diabetes mellitus</th>
<th align="left">Colon cancer</th>
<th align="left">Obesity</th>
<th align="left">Leukemia</th>
<th align="left">Hypertension</th>
<th align="left">Atherosclerosis</th>
<th align="left">Rheumatoid arthritis</th>
<th align="left">Embryoma</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Community 0</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Community 1</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Community 2</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Community 3</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Community 4</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Community 5</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Community 6</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
</tr>
<tr>
<td align="left">Community 7</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="Fig5">
<label>Fig. 5</label>
<caption>
<p>The top 500 most frequently mentioned genes are shown, where
<italic>radius</italic>
represents the number of abstracts which mentioned the gene, and the
<italic>colour</italic>
represents the country which mentioned the gene the most</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig5_HTML" id="MO5"></graphic>
</fig>
</p>
<p>In general, the difference between the top countries which published articles pertaining to breast cancer was not very different from the top countries which published articles containing breast cancer genes. Therefore, in these top countries, the molecular side of breast cancer was just as studied as are other aspects of breast cancer; this shows the importance of genetics in breast cancer research.</p>
</sec>
<sec id="Sec10001">
<title>Collaborations</title>
<p>We assume a collaboration if a paper had affiliations with institutions in different countries. The number of collaborations between countries on articles which had to do with breast cancer occurred most likely between United States and China (see Fig. 
<xref rid="Fig6" ref-type="fig">6</xref>
). However, when we considered collaborations on articles which mentioned breast cancer genes, countries which had the largest number of published articles such as United States, United Kingdom, and China had a slightly lower number of collaborations. However, countries with a lower amount of publications had more collaborations than before (see Fig. 
<xref rid="Fig7" ref-type="fig">7</xref>
). Collaboration information allows researchers to recognize countries which are most involved in research as a partnership with others.
<fig id="Fig6">
<label>Fig. 6</label>
<caption>
<p>Collaboration between the top 10 countries in regard to breast cancer abstracts</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig6_HTML" id="MO6"></graphic>
</fig>
<fig id="Fig7">
<label>Fig. 7</label>
<caption>
<p>Collaboration between the top 10 countries in regard to breast cancer abstracts that contain genes</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig7_HTML" id="MO7"></graphic>
</fig>
</p>
</sec>
<sec id="Sec10000">
<title>What are the top studied genes in the breast cancer field?</title>
<p>Researchers may want to know the top studied genes in the breast cancer field, so that they may focus their research on promising genes. The top two most mentioned genes in the breast cancer abstracts were ESR1 and ERBB2 (Fig. 
<xref rid="Fig8" ref-type="fig">8</xref>
). The next five most studied genes were EGF, PGR, CDKN2A, BRCA1, and SLC20A2 (Fig. 
<xref rid="Fig8" ref-type="fig">8</xref>
). In total, there were 21 unique genes, when we considered the top 10 most studied genes for the top 10 countries in breast cancer research. Related to these genes, more detailed information is listed in
<xref rid="Sec34" ref-type="sec">Appendix</xref>
: Table 
<xref rid="Tab11" ref-type="table">11</xref>
. However, please note that the curated source from DisGeNET did not contain information for CEAMC3, MUC21, and DHPS.
<fig id="Fig8">
<label>Fig. 8</label>
<caption>
<p>For the top 21 most frequently mentioned genes, the distribution of gene mentions by country is
<italic>colored</italic>
</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig8_HTML" id="MO8"></graphic>
</fig>
</p>
<p>To measure the amount of effort that a country X put into a gene Y, we divided the number of abstracts from country X which mentioned gene Y, by the number of papers published from country X. All of top 10 countries for breast cancer research put most of their effort into ESR1 and ERBB2 (Fig. 
<xref rid="Fig9" ref-type="fig">9</xref>
). Gene ESR1 received 11–20 % of the effort, with the United Kingdom contributing the highest effort. Gene ERBB2 is contained in 9–17 % of the effort, with France contributing the highest effort. For all the 21 unique genes, the effort ranged from 2–20 %.
<fig id="Fig9">
<label>Fig. 9</label>
<caption>
<p>The division of effort by the top 10 countries, for the top 100 genes for those countries</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig9_HTML" id="MO9"></graphic>
</fig>
</p>
<p>Unsurprisingly, the protein products of ERBB2 and ESR1 are targets of drug and hormone therapy for breast cancer.</p>
<p>ERBB2, popularly known as HER2, codes for a receptor tyrosine-protein kinase, which is found in membrane signaling complexes, and facilitates the transmission of cell messages [
<xref ref-type="bibr" rid="CR27">27</xref>
]. If ERBB2 is over-expressed, then the cell may get too many messages to proliferate and to survive, which may lead to breast cancer. Breast cancer patients which are ERBB2 positive (30 % of patients) can be treated with the medication trastuzumab, with the trade name Herceptin [
<xref ref-type="bibr" rid="CR28">28</xref>
].</p>
<p>On the other hand, ESR1 codes for the first out two types of estrogen receptors, which is found in breast cancer cells.</p>
<p>The estrogen receptor is a transcription factor found in the cytosol, but when activated by the hormone estrogen, it can move into the nucleus and regulate growth and proliferation genes. Estrogen receptors are over-expressed in about 70 % of breast cancer cases. [
<xref ref-type="bibr" rid="CR29">29</xref>
]. Three hormone drugs that are used to block estrogen receptors are tamoxifen, toremifene (fareston), and fulvestrant (faslodex) [
<xref ref-type="bibr" rid="CR29">29</xref>
,
<xref ref-type="bibr" rid="CR30">30</xref>
].</p>
<p>We were also interested to find whether some countries had a greater interest in some of the genes, as compared to other countries. For this analysis, we wanted to avoid genes that had been sparsely studied, so that the results would not be skewed. For example, consider the situation where gene X has only been mentioned in two abstracts and studied by two countries. Then the results would indicate that one of the countries has invested much effort into this gene, although that country may have only published one paper on the gene. Therefore, we analyzed the top 21 genes, where the number of abstracts for each gene ranged from 419 to 11,215.</p>
<p>When considering the number of abstracts, the United States has published the greatest number of papers for each gene, except in one case (Fig. 
<xref rid="Fig10" ref-type="fig">10</xref>
). For gene MYLIP, China has more abstracts than United States, with 327 versus 312. Notably, there are some countries that follow closely behind the United States for some of the genes. For gene CEACAM3, the United States has 212 abstracts and Japan has 151. For gene CTSD, the United States has 145 abstracts, and France has 114.
<fig id="Fig10">
<label>Fig. 10</label>
<caption>
<p>The proportion of gene mentions by each country</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig10_HTML" id="MO10"></graphic>
</fig>
</p>
<p>However, when considering the effort put into each gene, the United States did not hold the largest proportion of effort (Fig. 
<xref rid="Fig11" ref-type="fig">11</xref>
). Since the United States has published a lot of work on many genes, then the amount of effort for each gene decreases. For example, although the United States has published five times more papers than the United Kingdom on gene ESR1, the United Kingdom placed 20 % of its effort into gene ESR1, whereas the United States placed only 16 %. Information on country effort can be useful to find the priorities that each country places on the genes, relative to other countries.
<fig id="Fig11">
<label>Fig. 11</label>
<caption>
<p>In consideration of other genes that these countries have studied, this figure shows how much of that effort was placed on these genes</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig11_HTML" id="MO11"></graphic>
</fig>
</p>
<p>The
<italic>MYLIP</italic>
gene has seen more priority from China, with 5.0 % of China’s research effort into these gene, versus 0.2–1.2 % of effort coming from other countries (Fig. 
<xref rid="Fig11" ref-type="fig">11</xref>
). MYLIP also had more papers overall coming from China, rather than the United States, so this gene seems to be quite important for Chinese affiliated research. Although MYLIP does not appear to be a drug target, it seems to be upregulated by tamoxifen [
<xref ref-type="bibr" rid="CR31">31</xref>
].</p>
<p>MYLIP codes for a myosin regulatory light chain (MRLC) interacting protein [
<xref ref-type="bibr" rid="CR32">32</xref>
]. The MYLIP protein mediates ubiquitination, which is followed by degradation of the MRLC. When the MRLC is degraded, then neurite (an axon or dendrite of a neuron) outgrowth is also inhibited.</p>
<p>Some other genes that received more interest and priority from particular countries were ARL11 and 4.1 % of effort from Australia, CASP3 and 3.1 % of effort from China, BCL2L14 and 3.7 % of effort from The Netherlands, CEACAM3 and 2.8 % of effort from Japan, and CTSD and 3.1 % effort from Italy (Fig. 
<xref rid="Fig11" ref-type="fig">11</xref>
).</p>
<p>An interesting point to consider is how regulated breast cancer research is in each country. If the direction of breast cancer research is tightly regulated in some countries, then our study of publication effort towards the genes may reveal that direction. One way that the government of a country might regulate breast cancer research is to encourage funding for groups which are studying particular genes. Promising genes to study might be the ones which have high potential for target drugs, or the ones that have a higher impact on breast cancer for that country’s population.</p>
<p>One limitation is that that our paper set may also include genes which have only been studied in mouse or rat models. Therefore, it may be difficult to confirm how these genes have a relationship to breast cancer in humans.</p>
</sec>
<sec id="Sec576">
<title>Which genes were never mentioned by the top 10 countries?</title>
<p>In total, there are 445 genes which were not mentioned in any of the abstracts written by the top 10 countries. The largest frequency of a gene not mentioned in the abstract of a top country is seven abstracts. Such a low frequency of seven, as compared to 18,913 for the
<italic>ESR1</italic>
gene, indicates that the top 10 countries covered most genes. However, examining these genes may be interesting to to understand whether they have the possibility to be candidate genes or if they are outliers. To test this, we closely inspected some of genes, such as GLCE, which has abstract frequency of seven.</p>
<p>Gene GLCE codes for a protein called
<sc>d</sc>
-glucuronyl C5-epimerase, an enzyme which biosynthesizes the carbohydrate portion of heparan sulphate proteoglycans (HSPGs) present on cell surface [
<xref ref-type="bibr" rid="CR33">33</xref>
]. Enzymes which biosynthesize cell-surface sugar have the potential to be implicated in cancer growth because cell-surface sugar and proteins (proteoglycans) are involved in signalling to cells. Signalling may indicate to a cell whether it should divide or not. If genes or proteins which have a role in such a signalling pathway are defected, then the cell may begin to divide infinitely, and therefore become cancerous.</p>
<p>Interestingly, in one of the few research articles that mentioned GLCE, it was shown to have an antiproliferative effect on breast cancer cells. It was found that the down-regulation of GLCE may indeed lead to breast cancer [
<xref ref-type="bibr" rid="CR33">33</xref>
]. Therefore, the case study of GLCE shows that although some genes may not be mentioned as frequently as others in the abstracts, they still have potential to be important genes to breast cancer.</p>
<p>Another example is
<italic>CHRM1</italic>
gene, which had a frequency of five abstracts. However, CHRM1 seems to be much involved in prostate cancer [
<xref ref-type="bibr" rid="CR34">34</xref>
]. It codes for an acetylcholine receptor involved in the autonomous nervous system. Again, cell-surface receptors have a high potential to be involved in cancer because they form a crucial part of cell signalling. CHRM1 has been shown to have an effect on prostate cancer in a high-impact article with 56 citations to date, although it was published in 2013 [
<xref ref-type="bibr" rid="CR34">34</xref>
]. Therefore, another reason that some genes may have a low mentioning in the abstracts is that they have been shown to be important in another cancer, yet researchers are only recently investigating their connection to breast cancer. Genes which are not mentioned in many breast cancer abstracts may guide researchers to genes which require further investigation. With more research invested in these other genes, they may prove to be important biomarkers for breast cancer.</p>
</sec>
</sec>
<sec id="Sec7890">
<title>Hierarchical clustering</title>
<p>Hierarchical clustering is used to build a hierarchy of clusters, where two possible similarity measures that can be used are single-link and complete-link [
<xref ref-type="bibr" rid="CR8">8</xref>
]. From a high-level perspective, Single-link clustering produces clusters based on how similar the items are to one another, whereas complete-link clustering produces clusters based on how dissimilar the items are.</p>
<p>We applied hierarchical clustering between the countries, based on the genes that each country studied. We used the complete-linkage measure, because this measure has the advantage or producing more compact clusters, which leads to a clearer hierarchy. Our clusters were already very similar to each other, so we wanted to create more separateness. The results of the hierarchical clustering are displayed in Fig. 
<xref rid="Fig12" ref-type="fig">12</xref>
. The hierarchical clustering revealed that Germany, Italy, and China formed one branch, and then the second branch was formed United Kingdom, Japan, United States, France, Australia, and Canada. Lastly, a third branch was formed by the Netherlands. A researcher can use Fig. 
<xref rid="Fig12" ref-type="fig">12</xref>
to see which countries have research interests in common.
<fig id="Fig12">
<label>Fig. 12</label>
<caption>
<p>Hierarchical clustering of the countries, based on the genes that each country studied. This figure shows how similar the research interests are across the countries</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig12_HTML" id="MO12"></graphic>
</fig>
</p>
</sec>
<sec id="Sec548">
<title>Frequent pattern mining</title>
<p>Frequent pattern mining is used to find sets of items that occur frequently together in a database, and is often applied in grocery stores to discover which items the customers tend to purchase together [
<xref ref-type="bibr" rid="CR8">8</xref>
]. Different algorithms such as apriori and FP-growth may be applied to generate frequent item sets from a collection of transactions. We applied the FP-growth algorithm to find the frequent item sets using the tool KNIME.</p>
<p>One measure of significance for item sets is support. Support is a decimal value that represents the proportion of transactions in the database that contain a particular item set. For example, if the item set A, B, C is found in 10 % of all transactions, then that item set has a support of 0.1.</p>
<p>To produce more concise and pruned results, we additionally considered other constraints on the item sets, where each of the item sets had to be maximal closed. An item set is maximal if none of its super sets are frequent, and an item set is closed if none of its super sets have an equal support value. For an additional explanation of maximal closed item sets, please refer to [
<xref ref-type="bibr" rid="CR8">8</xref>
].</p>
<sec id="Sec5875">
<title>Genes frequently mentioned together by countries</title>
<p>We computed the maximal closed frequent item set to find which genes are frequently mentioned together by each country. We arbitrarily considered the top five item sets and they are listed in Table 
<xref rid="Tab2" ref-type="table">2</xref>
. We then took a closer look at the item set which contained the following genes:
<italic>BRCA1</italic>
,
<italic>ERBB2</italic>
,
<italic>ESR1</italic>
. In Fig. 
<xref rid="Fig13" ref-type="fig">13</xref>
, we used GeneMania to show that there is a relationship between the aforementioned genes, as found in the gene expression data and the literature. Red edges represent physical interaction, and purple edges represent co-expression.
<fig id="Fig13">
<label>Fig. 13</label>
<caption>
<p>
<italic>Black nodes</italic>
are genes listed in the third gene-country item set in Table 
<xref rid="Tab2" ref-type="table">2</xref>
. As described by GeneMania, the
<italic>purple connections</italic>
represent co-expression, whereas the
<italic>red connections</italic>
represent physical interaction between the gene products</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig13_HTML" id="MO13"></graphic>
</fig>
</p>
</sec>
<sec id="Sec685">
<title>Genes frequently mentioned together every year</title>
<p>Again, we computed the maximal closed frequent items sets for genes that are mentioned together every year. We arbitrarily considered the top five item sets and they are listed in Table 
<xref rid="Tab3" ref-type="table">3</xref>
. We then took a closer look at the item set which contained the following genes:
<italic>AMN</italic>
,
<italic>CD40LG</italic>
,
<italic>CD79A</italic>
,
<italic>CEACAM3</italic>
,
<italic>ESR1</italic>
,
<italic>PRL</italic>
. In Fig. 
<xref rid="Fig14" ref-type="fig">14</xref>
, we used GeneMania to show that there is a relationship between the aforementioned genes, as found in the gene expression data and the literature. Blue edges represent co-localization, purple edges show co-expression, and turquoise lines show genes that belong to the same pathway.
<fig id="Fig14">
<label>Fig. 14</label>
<caption>
<p>
<italic>Black nodes</italic>
are genes listed in the third Gene-Year item set in Table 
<xref rid="Tab3" ref-type="table">3</xref>
. The connections between the genes are described by GeneMania as
<italic>blue</italic>
for co-localization of the gene products and
<italic>purple</italic>
for co-expression of the genes</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig14_HTML" id="MO14"></graphic>
</fig>
</p>
<p>The major genes related to top 10 diseases are represented in Table 
<xref rid="Tab4" ref-type="table">4</xref>
. Related to this table more detailed analysis for each gene is listed in
<xref rid="Sec34" ref-type="sec">Appendix</xref>
: Tables 
<xref rid="Tab9" ref-type="table">9</xref>
and
<xref rid="Tab10" ref-type="table">10</xref>
. These tables show more details about disease associations for genes, studied country information, and genes that share more diseases with related genes.</p>
</sec>
</sec>
<sec id="Sec254">
<title>Soft clustering</title>
<p>Soft clustering techniques are useful when items cannot be distinctly separated into clusters [
<xref ref-type="bibr" rid="CR8">8</xref>
]. The clusters are formed such that each item has degrees of membership to the clusters. For example, item
<italic>A</italic>
may have a 0.1 membership value to cluster
<italic>X</italic>
and a 0.7 membership value to cluster
<italic>Y</italic>
. This technique is often used when there are items that may belong to a ‘grey’ area. We used soft clustering techniques, such as fuzzy c-means, because the separation between the clusters was not very clear (see Fig. 
<xref rid="Fig16" ref-type="fig">16</xref>
). Before deciding to use fuzzy c-means, we attempted to use density-based clustering techniques, yet they were unsuccessful and only returned one cluster. We used Matlab toolbox
<xref ref-type="fn" rid="Fn8">8</xref>
to perform fuzzy c-means (FCM) clustering.</p>
<sec id="Sec5455">
<title>Finding the optimal number of clusters</title>
<p>To find the optimal cluster number, we did cluster validation analysis. No validation index is reliable only by itself, so that is why all the indexes
<italic>c</italic>
(cluster numbers) between 2 and 15 are shown in Fig. 
<xref rid="Fig15" ref-type="fig">15</xref>
, and the optimum can be only detected with the comparison of all the results. We consider that partitions with less clusters are better, when the differences between the values of a validation index are minor. Cluster validation is used to evaluate how well the partitions have been produced [
<xref ref-type="bibr" rid="CR35">35</xref>
], which is the reason why we chose the number of clusters as 3 and 4. For the cluster validation, we used four validation indexes: partition coefficient (PC), classification entropy (CE), partition index (PI) and the Xie-Beni index (XBI).
<fig id="Fig15">
<label>Fig. 15</label>
<caption>
<p>Validation of the number of fuzzy clusters using various measures</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig15_HTML" id="MO15"></graphic>
</fig>
</p>
<p>In Fig. 
<xref rid="Fig15" ref-type="fig">15</xref>
a, the main drawback of PC is that the values are monotonically decreasing as
<italic>c</italic>
increases. CE has the same problem: it monotonically increases as
<italic>c</italic>
increases, with a hardly detectable elbow point. Out of the scores for PC and CE, the number of clusters can be only rated to 3. More informative diagram is shown: PI sharply decreases at the
<italic>c</italic>
 = 3 point. The XBI index is also monotonically decreasing and reaches the local minimum while
<italic>c</italic>
is increasing. Considering that PI is more useful, when comparing different validation indexes with the same
<italic>c</italic>
, we chose the optimal number of clusters as 3.</p>
<p>In Fig. 
<xref rid="Fig15" ref-type="fig">15</xref>
b, PC and CE again have the same problems: they are monotonically decreasing or increasing while
<italic>c</italic>
is increasing, which results in a hardly detectable elbow point. Out of the scores for PC and CE, the number of clusters can be only rated to 3. The more informative diagram is PI, which decreases at the
<italic>c</italic>
 = 3 point. The XBI index also reaches its local minimum at
<italic>c</italic>
 = 5. Considering the PI and XBI indexes, we chose the optimal number of clusters as 4. To reduce the number of dimensions to 2 (from 159 for gene-country, and 52 for gene-year) we used Principal component analysis (PCA) through Matlab in order to visualize our data (See Fig. 
<xref rid="Fig16" ref-type="fig">16</xref>
).
<fig id="Fig16">
<label>Fig. 16</label>
<caption>
<p>Two dimensional representations of the fuzzy clusters for the gene-year (
<bold>a</bold>
) and for gene-country (
<bold>b</bold>
) relationships. For the gene-year clusters, the number of clusters was set to 3, whereas for the gene-country clusters the number of clusters was set to 4. Each
<italic>color</italic>
represents a different cluster.
<italic>Points</italic>
marked by a
<italic>blue’x’</italic>
are the maximal closed items from the frequent mining analysis in Tables 
<xref rid="Tab2" ref-type="table">2</xref>
and
<xref rid="Tab3" ref-type="table">3</xref>
</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig16_HTML" id="MO16"></graphic>
</fig>
</p>
</sec>
<sec id="Sec4545">
<title>Where do key genes lie in the soft clusters?</title>
<p>We wanted to answer the following questions: Do key genes lie in the fuzzy areas of the clusters? Did the key genes belong among different clusters? Did all the key genes belong to one cluster? We wanted to compare the results of the frequent pattern mining to that of the soft clustering.</p>
<p>The genes frequently mentioned together by country and year (see Tables 
<xref rid="Tab2" ref-type="table">2</xref>
,
<xref rid="Tab3" ref-type="table">3</xref>
) which were found from a frequent mining analysis (FCM) are marked by a blue
<italic>lxl</italic>
in Fig. 
<xref rid="Fig16" ref-type="fig">16</xref>
which represents the soft clusters in 2D space. We then cross-matched the genes of the frequent pattern mining itemsets from Tables 
<xref rid="Tab2" ref-type="table">2</xref>
and
<xref rid="Tab3" ref-type="table">3</xref>
with the genes of the FCM clusters. All of the genes were found to be in the fuzzy areas of the clusters, which means that none of the genes strictly belonged to one of the clusters (Fig. 
<xref rid="Fig16" ref-type="fig">16</xref>
). This might mean that the genes in the closed maximal frequent item sets are key genes that are often mentioned with other genes as well across articles.</p>
</sec>
</sec>
<sec id="Sec455">
<title>Network analysis</title>
<p>Network analysis, often called “Social Network Analysis” because it was first developed to study social structures, is a strategy to find communities within data [
<xref ref-type="bibr" rid="CR9">9</xref>
]. Network analysis takes into consideration a set of “actors” and a set of “actions” between the actors. The characteristics of the actors are secondary in importance to the relationships between the actors.</p>
<p>There are various measures that one can use to find key actors within the network. One measure is called modularity, which is an integer that denotes what community a particular actor belongs to. Another measure is called closeness, which is a relative measure for the number of shortest paths an actor has to all other actors. The higher the closeness value that an actor has, the more connected this actor is to all other actors through short paths. In terms of sociology, an actor with a high closeness would be highly efficient at spreading information to a lot of people. A third measure that we will reference in our work is betweenness. Betweenness measures the number of shortest paths that pass through an actor. In terms of sociology, an actor with high betweenness is the best “middle man”, and if removed from the network, will disconnect a lot of people and communities.</p>
<p>We applied network analysis on the genes that we collected by considering the genes as “actors”, and the “actions” as co-occurrences within the abstracts. To conduct network analysis, we first built a weighted adjacency matrix between all of the genes we collected, such that each intersected value between two genes represented the number of abstracts that these two genes co-occurred within.</p>
<p>After creating the gene–gene network from the adjacency matrix, the network contained noise comprised of some genes which were unconnected to any other genes which made it difficult to comprehend, as seen in Fig. 
<xref rid="Fig17" ref-type="fig">17</xref>
. The full network contained 8400 nodes with 213,894 edges (Table 
<xref rid="Tab5" ref-type="table">5</xref>
). To get more concise results, we then did connected component analysis in order to reduce the number of edges and nodes to get the giant component. If the largest component takes a significant part of the graph, then it can be considered as the giant component [
<xref ref-type="bibr" rid="CR36">36</xref>
]. Our giant component contained 90.71 % of the full network (see Table 
<xref rid="Tab5" ref-type="table">5</xref>
). However, the number of edges in the giant component, 213,877 was almost unchanged from the number of edges in the full network.
<fig id="Fig17">
<label>Fig. 17</label>
<caption>
<p>The full gene–gene network derived from the co-occurrence of genes within the abstracts. The ring of noise (disconnected genes) surrounds the network. The network is difficult to understand in this form, prior to pruning</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig17_HTML" id="MO17"></graphic>
</fig>
</p>
<p>To further prepare the network for analysis, we pruned edges with weight less than 10, where edge weight is the frequency of genes’ co-occurrence in the abstracts. The pruned network was therefore more condensed and showed stronger connections, or the heart of the full network 18. To the pruned network, we applied some network measurement techniques: closeness, betweenness, and modularity. The results of the measurement are reported in Table 
<xref rid="Tab6" ref-type="table">6</xref>
, ordered by their closeness and betweenness values. Depending on these measurements, we can see the first 10 most important genes in the network, which are listed in Table 
<xref rid="Tab6" ref-type="table">6</xref>
.</p>
<p>In Table 
<xref rid="Tab6" ref-type="table">6</xref>
, the modularity values show which genes are making communities together, similar to clustering. For example, ESR1, ERBB2, SLC20A2, EGF, and PGR are part of the same community because they all have a modularity class of 2. To validate these results, we wanted to see if this community could also be found in experimental data. We manually validated the genes listed in Table 
<xref rid="Tab6" ref-type="table">6</xref>
using BioGrid which is similar to GeneMania, because it uses analyzed experimental data from published articles in order to show communities of genes. We found that all genes except SLC20A2 had a physical interaction in the community. However, when we entered ESR1, ERBB2, SLC20A2, EGF, and PGR into GeneMania, it showed that all genes were indirectly related, either through shared protein domains, co-expression, pathways, etc. We, therefore, found some experimental evidence that genes in group 2 were indeed related, although the interaction may be indirect. Researchers can use these communities to find genes which may be indirectly connected, and then use experimental evidence to potentially strengthen the connection of these genes into the community.</p>
<p>Similarly, for genes
<italic>CDKN2A</italic>
,
<italic>BRCA1</italic>
, and
<italic>HLA-H</italic>
which all belong to modularity class 6, we performed analysis similar to that of modularity class 2. Using BioGRID, we found published evidence that CDKN2A and BRCA1 have a direct physical interaction, but not with HLA-H. However, using GeneMania, we found that there is an indirect interaction between HLA-H and the other two genes. For CDH1, we performed a different analysis, to confirm that this gene has a strong gene-disease relationship with breast cancer. We found that CDH1 has been experimentally shown to strongly influence the presence of breast cancer.
<xref ref-type="fn" rid="Fn9">9</xref>
For ACAD-9, we performed analysis similar to that of CDH1. To the best of our knowledge, we could not find experimental data which linked ACAD-9 to breast cancer. However, we decided to look further down the list of the most connected genes to find the next two genes which belong to class 5, so that we could perform an analysis similar to class 2 and 6. The next two well-connected genes of class 5 are MAPK10 and KRAS. GeneMania indicated that these genes are indirectly connected. Since MAPK10 codes for a protein centrally involved in a host of signalling pathways,
<xref ref-type="fn" rid="Fn10">10</xref>
it is likely that it is involved in cancer. Signalling proteins indicate to the cells whether they should proliferate or not, so should the protein function be defected, the cell may divide indefinitely as a cancer [
<xref ref-type="bibr" rid="CR34">34</xref>
].</p>
<p>We examined the smallest community (community 1 is chosen, yellow nodes in Fig. 
<xref rid="Fig18" ref-type="fig">18</xref>
, which includes 229 nodes) from the pruned network to see how well the gene nodes were connected using the GeneMania resource. The results of the analysis are displayed in Fig. 
<xref rid="Fig19" ref-type="fig">19</xref>
, where all genes are connected through co-expression, except for four genes:
<italic>SPRR2A</italic>
,
<italic>C5orf27</italic>
,
<italic>FOXP4</italic>
, and
<italic>MT-ND3</italic>
. The large number of connections through co-expression provides experimental support for this community. Genes which were not co-expressed with the others in the community may be genes which have yet to be validated into the community; this community may serve as a hint to primary researchers who wish to find other connections for these genes. If a researcher would like to further validate the other communities with GeneMania, we have provided the full list of network analysis genes and their modularity class (the community they belong to) in Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
.
<fig id="Fig18">
<label>Fig. 18</label>
<caption>
<p>The gene–gene network. Each community is represented as a
<italic>different color</italic>
</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig18_HTML" id="MO18"></graphic>
</fig>
<fig id="Fig19">
<label>Fig. 19</label>
<caption>
<p>The co-expression network retrieved from GeneMania, which was used to validate the relationships between the genes within the smallest community (community 1) from our gene–gene network. Each
<italic> circle</italic>
represents a gene, and each
<italic> purple line</italic>
represents co-expression between the connected genes</p>
</caption>
<graphic xlink:href="13104_2016_2023_Fig19_HTML" id="MO19"></graphic>
</fig>
</p>
<p>Table 
<xref rid="Tab7" ref-type="table">7</xref>
shows which diseases are more common in each community so that we can group and target these communities based on their problem to cure. More detailed information about community-disease relation is represented in
<xref rid="Sec34" ref-type="sec">Appendix</xref>
: Table 
<xref rid="Tab8" ref-type="table">8</xref>
. This table shows the top five diseases for each community and the number of genes related to each disease and the name of these genes. For example, communities 0, 2, 3, 4, and 6 are more related with cancer and its types such as breast cancer. While these communities are targeted for cancer treatment, communities 1 and 4 for diabetes mellitus, and community 7 for leukemia may be focused on treatment.</p>
<p>Castro et al. [
<xref ref-type="bibr" rid="CR37">37</xref>
] have reported in their work that ESR1, FOXA1, GATA3, SPDEF, AR, RARA and XBP1 are critical for
<italic>ER</italic>
<sup>+</sup>
disease and known to be central to breast cancer risk. In our results, all these genes are found in community 2 which is the mainly related to the breast cancer, except that XBP1 is in community 3 (see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
).</p>
</sec>
</sec>
</sec>
<sec id="Sec33">
<title>Conclusions</title>
<p>The work described in this paper contributes a novel framework which is capable of investigating how research groups in various countries address breast cancer. We investigated the genes or proteins studied by various research groups by carefully analyze their published research articles to identify the molecules they reported as biological biomarkers of breast cancer. Interestingly, we realized that researchers have reported interest in a variety of genes over time and even based on the country where the research is conducted. This might be due to other external factors particular and specific to each community or country, though some of the discovered genes were reported to have similar function. Thus we demonstrated how the gene–gene, gene-year, and gene-country relationships provide some interesting gene hypotheses that primary researchers might consider in their research. Further, this paper shows the power of integrating data mining and network analysis techniques.</p>
<p>As future work, we will also account for the semantic relations or directionality between the genes. For example, we will find relationships such as “gene A up-regulates gene B”, rather than “gene A and gene B have a relationship due to co-occurence within an abstract”. We will also attempt to upgrade the text mining application to perform full-text analysis, rather than abstract analysis. Although abstracts are useful because they summarize the articles, the full text of the articles contain more information, especially the experimental analysis and discussion sections. However, full-text mining presents many more challenges, such as errors from conversion to plain text, and problems with reading text from tables and figures [
<xref ref-type="bibr" rid="CR38">38</xref>
]. We are currently investigating other types of cancer and diseases in general. We expect to report some interesting finding shortly.</p>
</sec>
</body>
<back>
<app-group>
<app id="App1">
<sec id="Sec34">
<title>Additional file</title>
<p>
<media position="anchor" xlink:href="13104_2016_2023_MOESM1_ESM.csv" id="MOESM1">
<caption>
<p>10.1186/s13104-016-2023-5 Additional Tables.</p>
</caption>
</media>
</p>
</sec>
</app>
<app id="App2">
<sec id="Sec35">
<title>Appendix</title>
<p>See Tables 
<xref rid="Tab8" ref-type="table">8</xref>
,
<xref rid="Tab9" ref-type="table">9</xref>
,
<xref rid="Tab10" ref-type="table">10</xref>
and
<xref rid="Tab11" ref-type="table">11</xref>
.
<table-wrap id="Tab8">
<label>Table 8</label>
<caption>
<p>Gene-disease associations from gene–gene analysis</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Community ID</th>
<th align="left">Disease names</th>
<th align="left"># of total genes in community</th>
<th align="left"># of genes sharing disease</th>
<th align="left">Gene names</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="5">0</td>
<td align="left">Cancer</td>
<td align="left" rowspan="5">2105</td>
<td align="left">133</td>
<td align="left">EPHB4, MYCN, SOX9, RPL22, SPARC, ABL1, EAF2, PDGFA, PDGFB, SLC39A1, SPP1, RPS3, UNC5B, PIWIL1, GALR2, ETS1, DAG1, ETV4, EWSR1, CHD4, ITGA3, F2R, MMP20, ITGAV, ADAM10, ITGB3, ITGB4, TUBA4A, ZEB2, PTHLH, PTH1R, NMU, TWIST1, STRAP, JAG2, S100A4, HOXA9, BMI1, GJA1, BMP2, BMP4, BMP7, JUP, BMPR1A, JTB, CD82, HOXC8, GPC3, RHOU, NUAK1, CTNNBIP1, ITIH1, BSG, YAP1, GLI1, CTAGE1, PVRL1, KIF14, PLAU, ALAS1, MMP1, MMP2, MMP7, MMP9, MMP11, MMP14, SDC1, NANOS1, ARHGEF6, KIF11, VGF, KLK11, NID2, SFRP1, SFRP2, SFRP4, CD248, ADAMTS1, PODXL, ANXA1, USP28, WNT1, WNT2, WISP1, WNT5A, WNT7A, ARL6IP5, SLIT2, WNT2B, RIN1, SHH, GEMIN5, LAMC2, MMP26, HIF3A, RUNX2, RUNX3, KLK3, CLDN2, CLDN1, SLC2A4, ARPC2, POSTN, USP6, ORM2, HHIP, SMURF2, EFNB2, SPINT2, CD9, FAM107A, CYR61, TIMP2, TIMP3, YKT6, SNAI2, SP5, ROBO1, IRAK3, NDC80, SNAI1, CTNNB1, LUM, CTSB, KLK13, PCDH8, BCR, DKK3, RPL10, SMAD2, SMAD4, RGL4, SMAD7</td>
</tr>
<tr>
<td align="left">Breast cancer</td>
<td align="left">65</td>
<td align="left">HOXA5, WISP3, WISP2, MEST, PTPN1, HOXB13, BMP5, BMP6, UBE2B, TLK1, ETV1, KLK4, NMI, NEUROD1, ADAM28, CSF1R, PER2, RHOU, LIMD1, PTPRJ, TIMP1, ARNT2, ARID4A, TIMP4, INHBA, LATS2, TNC, USP28, SLC2A3, IGHMBP2, IBSP, VCAN, VTN, AFF3, WASF2, SERPINE1, CST3, POLI, ETS2, CSTA, LAMA3, CTGF, ADAMTS8, FURIN, MMP3, MMP8, LCN2, SIX1, MMP13, WNT9A, PCBP1, F2RL1, F3, CTSK, F7, TUBA4A, F10, SERPINA5, SDC4, RNF11, BMPR2, ANXA8L2, KLK2, PINK1, HOXA1</td>
</tr>
<tr>
<td align="left">Prostate cancer</td>
<td align="left">63</td>
<td align="left">AMBP, RNF14, KLK4, KIAA0196, PTPN1, BMP5, BMP6, BMPR1B, BMPR2, PTPN12, HOXC8, CSF1R, PDX1, EAF2, SERPINA5, PAGE4, SPINT1, SLC39A1, ACAT2, PLG, DSPP, GLI2, COPE, IBSP, VCAN, CLPTM1, EHF, SERPINE1, DVL1, ETV1, PDGFD, LATS2, CDCP1, PLAU, CRISP3, DAZL, TREX2, ELK4, TIMP1, TSPY1, RLN2, ACVR2A, CYSLTR1, ITGA7, MMP12, KLK3, MMP15, MMP17, F2RL1, ATP2A1, F3, CTSK, INHA, GFI1, HOXB13, TIMP4, RPL10, KLK2, ADAMTS9, CST3, RLN1, ZNFX1, ADAMTS13</td>
</tr>
<tr>
<td align="left">Diabetes mellitus</td>
<td align="left">60</td>
<td align="left">RLN2, XYLT2, SERPINB2, PKLR, GJA1, BMP4, BMP6, BMP7, GREM1, NEUROD1, FBP1, UTS2, CALD1, TIMP1, HLA-DMB, TIMP3, PTPRN2, SPP1, TJP1, TNC, PTX3, KCNJ10, PLA2G4A, CLPS, SERPINE1, CST3, CD9, MTTP, SHH, LRP5, ANKRD1, PTPN22, KIF11, CTGF, MMP14, GCK, ISL1, MMP1, MMP2, FTO, MMP8, TIMP2, DCN, F2, CTSB, AKR1B1, F3, ITGB3, CLOCK, AQP7, SDC2, PTGES2, SLC2A4, GGT1, FABP1, FABP2, PINK1, CYBA, SMAD7, FOXC2</td>
</tr>
<tr>
<td align="left">Colon cancer</td>
<td align="left">51</td>
<td align="left">PMP22, MMP25, RNF14, HSPE1, PTPN1, C1GALT1C1, BMPR1A, DKK4, HTR2A, CYSLTR1, STRAP, TIMP1, TIMP4, LLGL1, TJP1, TNC, ASCL2, KLF9, FDPS, TOMM34, CNOT7, ZKSCAN3, SER- PINE1, CEACAM7, SOX17, OLFM4, LYPD3, PLA2G4A, HRH2, DLL1, NTN1, ADAMTS13, MMP3, ACVR2A, LCN2, MMP10, CDCP1, MMP13, ADAMTSL3, SRPRB, F2RL1, AKR1B1, CTSH, CLDN12, ITGB6, SDC2, KLK1, GGT1, B3GNT8, CD226, ACTR2</td>
</tr>
<tr>
<td align="left" rowspan="5">1</td>
<td align="left">Diabetes mellitus</td>
<td align="left" rowspan="5">229</td>
<td align="left">29</td>
<td align="left">GH1, GHR, SOCS2, NAMPT, LIPE, RETN, IGF2R, IGFBP1, IGFBP3, NUDT1, LNPEP, ADIPOR2, INS, FGF21, LPL, RBP4, POMC, APOA1, APOA2, IRS1, IDE, APOC3, HSD11B1, CFI, PLTP, LEPR, SLC2A2, ADD1, FABP4</td>
</tr>
<tr>
<td align="left">Obesity</td>
<td align="left">23</td>
<td align="left">GH1, GHSR, LIPE, IGF1, IGF2, IGFBP3, IGFBP6, ADIPOR2, INSR, SOCS3, SHBG, POMC, APOA2, IRS1, HSD11B1, RETN, SERPINA6, LEP, LEPR, SLC2A2, RBP4, FABP4, ADRB1</td>
</tr>
<tr>
<td align="left">Breast cancer</td>
<td align="left">22</td>
<td align="left">GH1, GH2, GHR, SOCS2, SLC12A6, VIP, IGF1, ADIPOQ, IGFALS, IGFBP1, TIAL1, FOXL2, INS, FBXO31, INSR, SOCS3, SHBG, SLC12A7, LEP, LEPR, ADIPOR2, IGFBPL1</td>
</tr>
<tr>
<td align="left">Cancer</td>
<td align="left">21</td>
<td align="left">IGFBP5, GHRH, GHRHR, GHSR, TPP2, TMPO, CLIC4, ELAVL1, PLAG1, PAPPA, LRP1, IRS1, PTP4A3, ADCYAP1R1, IGF1R, IGF2, IGF2R, OXT, IGFBP2, IGFBP3, IGFBP4</td>
</tr>
<tr>
<td align="left">Prostate cancer</td>
<td align="left">17</td>
<td align="left">GNA13, GHR, VIPR1, MYO6, PMEPA1, LAMB1, NUDT1, INS, SOCS3, LEP, VIP, ADIPOR2, AD- CYAP1, IGF1, ADIPOQ, IGFBP1, IGFBP2</td>
</tr>
<tr>
<td align="left" rowspan="5">2</td>
<td align="left">Cancer</td>
<td align="left" rowspan="5">876</td>
<td align="left">47</td>
<td align="left">S100A2, NAT1, S100A11, NES, CDR2, ERBB2, ERBB3, ERBB4, EREG, BCL11B, MET, SCGB2A2, RARB, EGF, LRIG1, RET, NKX2-1, TK1, PA2G4, AXL, ESR1, RARRES1, CYP24A1, RBP1, PROM1, CD24, MKI67, AGR2, GATA4, ETV6, MCM2, ALCAM, MME, GPNMB, KRT7, SCEL, KIR3DL2, CTSD, TACSTD2, KIT, AKR1C1, ALK, AR, RXRA, CXADR, CCKBR, CDX2</td>
</tr>
<tr>
<td align="left">Breast cancer</td>
<td align="left">40</td>
<td align="left">NEDD8, RNF5, LHCGR, GABRP, PLAC1, CARM1, NCOR1, KRT5, CD1A, NCOA2, SCGB2A1, SCGB1D2, LATS1, RARA, SRA1, STS, MAOA, KRT18, EML4, BTC, HBEGF, TNN, CYP19A1, ESRRA, GABARAP, PIP, HTATIP2, CYP27B1, GATA3, NRG2, NCOR2, GNA12, WWTR1, NRG3, KIAA0100, F8, AKR1C2, AKR1B10, GFRA1, AREG</td>
</tr>
<tr>
<td align="left">Prostate cancer</td>
<td align="left">28</td>
<td align="left">MSR1, CARM1, NCOA2, LHB, NCOR2, NR2C1, ERBB3, RARA, STS, PELP1, HPN, CADM4, CYP19A1, ESRRA, ESRRB, HTATIP2, CYP27B1, LPXN, CYP7B1, GC, GNA12, HSD3B2, CHGA, WDR77, AR, RXRB, RXRG, AREG</td>
</tr>
<tr>
<td align="left">Leukemia</td>
<td align="left">24</td>
<td align="left">CISH, NCOA2, CKMT2, RHOH, MYH11, BCL11B, RARA, CEBPE, HOXD3, HOXD13, TLE1, GRAP2, PVRL2, GATA1, IRF8, NSD1, ETV6, GNA12, CBL, CTSG, IL2RG, ENO2, F8, SLC4A1</td>
</tr>
<tr>
<td align="left">Diabetes mellitus</td>
<td align="left">20</td>
<td align="left">IRF8, MAP4K5, HTR1A, ACTG2, ADRA2B, GAD2, BTC, GC, AR, ESR1, TRPC6, DBI, CYP24A1, AKR1B10, CHGA, CYP27B1, MME, CD24, EGF, PLXDC1</td>
</tr>
<tr>
<td align="left" rowspan="5">3</td>
<td align="left">Cancer</td>
<td align="left" rowspan="5">1210</td>
<td align="left">101</td>
<td align="left">SOX2, VPRBP, ALOX12, HSP90B1, TRAF2, HIF1A, HK2, MAT2A, HSPH1, BAD, GATA6, BCL2, BCL2L1, TNFRSF10A, SPAG1, PTGS2, HNRNPK, NRP2, NRP1, TYMS, SEMA3A, TUBB3, S100P, RHBDF1, KCNA1, NDRG1, CEBPB, SPIN1, RBMX, BID, PKNOX1, FGF5, PLAGL1, KDR, BCL10, TOMM40, FGFR3, FGFR4, FH, FXYD3, EIF2AK3, HSPA5, VEGFC, MEN1, ALOX5, SDHA, SDHB, SDHC, SDHD, ALOX15B, MPG, VEGFB, CA9, IL24, VHL, SEMA4D, SFRP5, ANG, ANGPT1, ANGPT2, BCL2L10, DIABLO, ANXA2, PRDX4, L1CAM, BAG3, CASP3, ID2, BIRC2, BIRC3, XIAP, SIVA1, RELA, LYVE1, POT1, IDH1, CAV3, FIGF, TYMP, SLC2A1, CDC37, PDPN, INTS6, BNIP3, MTAP, LOX, PYCARD, NEK8, ASPH, RBM6, ALOX15, CA1, TMSB10, HSP90AA1, CTTN, ENDOG, ENG, OLIG2, BBC3, EPAS1, BIRC7</td>
</tr>
<tr>
<td align="left">Breast cancer</td>
<td align="left">57</td>
<td align="left">CRYAB, SLC6A3, PTN, ADM, TUBA1B, PARP1, KLF8, SEMA3C, TXNIP, VWF, PTPRB, CUEDC2, APLN, KLF10, IGFBP7, FCGR2A, SLC16A1, CLU, WFS1, SQSTM1, RBM3, CSNK2A1, RBMX, JAG1, MIF, TRPS1, BAK1, CASP2, MAZ, XBP1, FADS2, FGF4, CASP9, IRF1, KLF4, REL, GNA11, HES1, SMPD1, NFATC1, BCL2A1, CCL16, SLC25A5, TNFRSF10B, APOE, RRAS, IKBKB, HSF1, IL1R1, FASLG, OSGIN1, RSPO1, PLXNA1, PRDX6, LSM1, CACNA1H, BIK</td>
</tr>
<tr>
<td align="left">Colon cancer</td>
<td align="left">41</td>
<td align="left">HSPD1, AIFM1, ACSL4, TRAF1, HIP1, NOD1, CLCA1, EFNB3, SAT1, NFATC1, HTRA2, CLU, FGF20, CALR, MYOD1, HPRT1, ANXA5, MIF, PRDX1, FES, CASP6, FGF7, AATF, TMEM97, ATF3, FGF18, GLRX3, TNKS2, LCP1, HSF1, FASLG, HSPA1A, SLC16A7, FGF19, HSPA8, DDIT3, MAF, PMAIP1, KIF2C, MPG, TXN</td>
</tr>
<tr>
<td align="left">Diabetes mellitus</td>
<td align="left">40</td>
<td align="left">TXNIP, ADM, PCSK2, PARP1, HIF1A, VWF, CEBPB, ANGPT1, ANGPT2, APLN, IMPDH2, ANG, KCNJ11, SLC16A1, WFS1, CAPN1, PRDX6, KLF10, XBP1, FADS2, CASP10, SI, PLAGL1, RELA, LTBR, TNFRSF10B, APOE, PBX1, PCBD1, TNMD, ENG, HSPA1A, EIF2AK3, HSPA5, ALOX5, SLC2A1, HSPB2, CA1, KLF2, TXN</td>
</tr>
<tr>
<td align="left">Prostate cancer</td>
<td align="left">38</td>
<td align="left">HSPD1, RND3, PTN, ADM, AMD1, MAPK8IP1, TPT1, CDC37, IGFBP7, XBP1, GAPDH, SPINK1, CLU, AIFM1, SQSTM1, JAG1, CAPN1, MIF, PBX1, FGF1, LAMA5, LAMC1, FGF8, FGF9, CACNA1H, ATF3, BCL2A1, APOE, IKBKB, CHUK, PCBP2, HSPA1A, HNRNPA1, RELB, LSM1, FABP5, TXN, RPL19</td>
</tr>
<tr>
<td align="left" rowspan="5">4</td>
<td align="left">Diabetes mellitus</td>
<td align="left" rowspan="5">601</td>
<td align="left">31</td>
<td align="left">GSTM1, SLC6A2, GSTP1, CYP1A1, GSTT1, MT1A, TSC22D1, ARNT, LIPC, IAPP, CETP, SLC22A4, SLC22A5, AGTR1, AGTR2, PON1, AHSG, UCP2, PYGL, CAT, REN, KEAP1, IL1RAP, ATP2A2, F5, GFPT1, EDN1, EDNRA, SOD1, SOD3, KL</td>
</tr>
<tr>
<td align="left">Cancer</td>
<td align="left">26</td>
<td align="left">GSTM1, EPHX2, GSTP1, CYP1A1, CYP1A2, CYP1B1, SLC22A18, PDCD2, TSC22D1, IAPP, PHF19, RB1, CETP, GSTT1, GLRX, FECH, AOX1, TSPO, APOBEC1, SIM2, AGPAT2, COPS2, MAP4K4, MVP,
<break></break>
EDN2, SOD2</td>
</tr>
<tr>
<td align="left">Breast cancer</td>
<td align="left">24</td>
<td align="left">SLC22A18, PPARGC1A, CYP2B6, ARNT, CYP4Z1, PIN1, CYP21A2, INSL4, AGTR2, SLC19A3, AHR, SLCO1B3, ZFHX3, AGTR1, CAT, HSD17B1, HSD17B2, ACE, GSTO1, ATP2B2, SLC26A1, EDN1, EDNRA, SOD1</td>
</tr>
<tr>
<td align="left">Hypertension</td>
<td align="left">21</td>
<td align="left">TSPO, PPARGC1A, ACE, EPHX2, ATP2A2, GCLC, UCP2, ENPEP, SLC6A2, CAT, GSTT1, EDN1, HSD3B1, REN, CYP21A2, SLC22A2, SOD3, CFTR, AGTR1, IAPP, DBH</td>
</tr>
<tr>
<td align="left">Atherosclerosis</td>
<td align="left">21</td>
<td align="left">GSTM1, VKORC1, SOD3, PON1, AHSG, GSTO1, CYP1A1, GSTT1, GCLM, SOD1, LDLR, NR1H3, ABCC6, EDN1, KL, EDNRA, ABCD1, APOC2, AGTR1, UCP2, EPHX2</td>
</tr>
<tr>
<td align="left" rowspan="5">5</td>
<td align="left">Cancer</td>
<td align="left" rowspan="5">988</td>
<td align="left">78</td>
<td align="left">EPHB2, SLC5A5, RABEP2, RHOA, RHOB, RHOC, MST1R, HRAS, RALA, RALB, WNK2, ARHGDIB, RPS6KB1, PEBP1, NEK3, PTPN6, VRK1, KRAS, JUN, MAPK14, KCNA2, PTPRA, ILK, KCNA5, AKT3, JAK2, PXN, BRAF, P2RX5, AKAP12, PTPRK, MAP2, EGR1, SDCBP, KCNH2, PIK3CG, PIK3R1, EZR, RPS6KA2, CXCL17, EIF4A2, EIF4E, DLGAP5, DAB2, IQSEC1, MAP3K1, KHDRBS1, TSC1, GPR56, TNK2, TIAM1, AKT1, AKT2, PLCB2, VAV3, PRMT3, CAV1, HBP1, GPRC5A, ARHGEF2, SNCG, MAP2K4, MELK, KISS1, GDF15, KIAA1524, SPHK1, TRIB3, RAF1, PTK2, DLC1, PKN3, CRK, RAC1, BCAR1, RAC3, LGALS7, ARF6</td>
</tr>
<tr>
<td align="left">Breast cancer</td>
<td align="left">50</td>
<td align="left">MST1, IL11RA, ADORA2B, LIMK1, EEF2, BMX, SLC9A3R1, DNAJA3, CSK, PHLDA1, IKBKE, SLC9A1, PTPRZ1, CSNK1A1, MAPKAPK2, KCNJ3, DUSP1, PDCD4, DUSP6, MBL2, EIF4EBP1, SH2D3C, EIF4G1, PAK1, ETV5, ATAD2, MLLT4, ROCK1, ACTN4, NR3C2, PLCD1, RHEB, PLD1, RB1CC1, NFATC2, EEF1D, FHL2, CHN2, RACGAP1, TSC2, TUBB, LPAR2, SH2D3A, RAB27A, RPL7A, DIRAS3, GAB2, PTK6, NEK3, WASL</td>
</tr>
<tr>
<td align="left">Prostate cancer</td>
<td align="left">36</td>
<td align="left">TYK2, FOXO1, IL11RA, PTK2B, HSPG2, SPRY2, JUND, LIMK1, SET, BMX, MAK, RAP2A, JAK1, NOX1, CSK, EGR1, F2RL3, MAPKAPK2, FDFT1, TLE3, RPS6KA3, EIF4EBP1, CPNE3, LRP2, ETV5, WFDC1, TRPM8, ELK1, PLCG1, UBIAD1, PAK6, REPS2, FHL2, LPAR1, RHEB, ITPR1</td>
</tr>
<tr>
<td align="left">Diabetes mellitus</td>
<td align="left">31</td>
<td align="left">DUSP12, EZR, GIP, ADORA2B, JUN, MAPK14, NOX1, SLC12A3, PTPRN, PIK3CG, LPA, INPPL1, EIF4A2, MBL2, LRP2, PLA2G2A, RDX, AKT1, AKT2, RORC, LRRC7, EIF4E, ARHGEF11, CHRM3, ELMO1, ITPR3, MAP3K1, CRTC2, EXOC4, MSN, TRPC1</td>
</tr>
<tr>
<td align="left">Rheumatoid arthritis</td>
<td align="left">24</td>
<td align="left">SLC5A5, RHOA, JAK2, JUN, MAP2K4, MRAS, NEDD9, BMX, PIK3CG, MAPK14, CENPJ, CSK, EGR1, IKBKE, GDF15, TRPC1, MBL2, EIF4G1, LRP2, C5, LPAR1, GAB2, RAC1, MAP3K2</td>
</tr>
<tr>
<td align="left" rowspan="5">6</td>
<td align="left">Cancer</td>
<td align="left">1397</td>
<td align="left">122</td>
<td align="left">MYC, CDKN1A, CDKN1B, CDKN1C, CDKN2A, CDKN2B, CDKN2C, SP1, SP4, PTTG1, ERCC1, ERCC2, CEBPA, ERH, ATR, STAG1, XRCC1, TRIO, HDAC8, PPP1R13L, BARD1, DCK, NBN, MCM3, EZH2, MCM7, CCND1, CAGE1, CHEK1, ALDH1L1, DCC, RRM1, RRM2, MDM4, ID4, ECT2, GADD45A, MOAP1, TUBG1, RYR1, DDX5, MAP3K4, NIT2, ADH1B, ADH1C, AQP1, HDAC3, CKS1B, FAP, RPRM, MGMT, BRCA1, BRCA2, KCNH1, TMPRSS2, SUPT7L, BUB1, MLH1, CDC73, FHIT, MBD4, PLK1, COPS5, SMYD3, BRMS1, RAD51, FOXM1, PMS2, BCL2L15, HDAC5, RBBP4, NEIL1, RBL2, UBE2C, APC, SHMT1, APEX1, RECQL, E2F2, E2F3, MSH2, LASP1, RNF139, NEK2, XRCC3, SKP2, IGF2BP1, ASH2L, PDLIM5, CCNA2, CCNB1, CCND2, CCND3, CCNE1, CCNG1, MSH6, TRAF4, IGF2BP3, MTA1, RNF2, RFWD2, MTHFR, EPHA2, WIF1, FBXO4, CST6, EXO1, SMARCA4, SMARCB1, DAPK1, RPL11, E2F1, ATM, FSCN1, PUM1, SH2D1A, MUTYH, MAD2L1, PCNA, XPC, AURKB, MYBL2</td>
</tr>
<tr>
<td align="left">Breast cancer</td>
<td align="left"></td>
<td align="left">64</td>
<td align="left">CDK9, RAD52, TOPBP1, FANCD2, MRE11A, HSPB8, MYBBP1A, RPS6KA6, BCAS2, ERCC4 RNASEL, CEBPD, HDAC6, HDAC4, XRCC2, DERL1, NOL3, MUS81, CENPF, CTCF, INHBB, RBBP7, RBBP8, RBL1, CCNE2, POLB, KLF5, C1QA, WWP1, XRCC4, CAPN2, PRC1, PEMT, MED14, PAK2, BCCIP, MTRR, XDH, SMARCE1, FOXP1, SH3GL2, E2F4, NCL, PBOV1, ANXA8, RRAD, SIPA1, CHKA, ATP1B2, MBD2, NOD2, PRDM14, DDB2, DUSP22, RGS2, PALB2, EP300, CLSPN, HIST2H3A, MLLT11, RAD50, RAD17, KPNA2, RAD23B</td>
</tr>
<tr>
<td align="left">Prostate cancer</td>
<td align="left"></td>
<td align="left">41</td>
<td align="left">GLIPR1, SUMO1, ERCC1, MT2A, IRX5, RNASEL, RCHY1, CEBPD, ERG, PALB2, BRCA2, PI16, BTRC, LZTS1, RBL1, KLF5, SGTA, TSGA10, SMARCA2, CCNA1, PAK2, SMARCC1, MTRR, FOXP1, TSG101, MSH3, PBOV1, RPS27A, TOPORS, SENP1, NUPR1, AQP3, CREBBP, MECP2, MSMB, ELAC2, EP300, CDK5R1, RAD9A, PCNT, RAD21</td>
</tr>
<tr>
<td align="left">Colon cancer</td>
<td align="left"></td>
<td align="left">39</td>
<td align="left">BLM, CITED2, SLC6A4, MRE11A, SND1, CDX1, MLH3, DDX17, HTR3A, LMNA, POLD1, CENPA, NOL3, UCHL1, NEIL2, KLF5, MATK, BRD7, TSGA10, RPS6KA6, HLTF, BCAT1, BCHE, CTBP1, E2F4, XPA, MSH3, LTC4S, CDC16, CHKA, CD3EAP, AIM2, METAP2, EP300, PPM1H, DDX5, RAD18, NOD2, CA8</td>
</tr>
<tr>
<td align="left">Embryoma</td>
<td align="left"></td>
<td align="left">34</td>
<td align="left">BLM, PCSK1, AVEN, HDAC11, HOXC9, RNASEL, RPRM, RNF2, HTR3A, FBXO4, STAG1, RCHY1, BCAS2, KLF5, UCHL1, WWP1, GAS1, ASS1, MTR, UHRF1, RECQL, NCL, XPO1, CCL23, CBS, MECP2, HDAC8, PALB2, RPL11, MAP3K4, HMG20B, DNMT3L, PCNT, KPNA2</td>
</tr>
<tr>
<td align="left" rowspan="5">7</td>
<td align="left">Leukemia</td>
<td align="left" rowspan="5">967</td>
<td align="left">77</td>
<td align="left">MPO, IFNG, CXCR5, IL11, SELE, RAG1, RAG2, SELL, ITGAX, ORM1, IL10, KLRC1, CD2, CD52, IL18, CD5, CSF1, CD7, CD8A, CSF3, CSF3R, CD19, P2RX7, CIITA, CALCA, ASAH1, CD86, TN- FRSF8, CD33, IGHM, CCL21, B2M, PVR, IL21, TNF, LAIR1, CCL2, CD160, HLA-A, ULBP2, CCL3, ICAM1, LAMP1, HLA-B, CCL4, CCL5, CCR4, GNLY, KIR3DL1, CCL11, CTLA4, CCL18, GCNT1, CCL19, ITGA4, CHIT1, CCL22, IL1A, IL1B, ITGAL, ITGAM, LYZ, IL2, IL2RA, IL2RB, IL3RA, TTR, IL4, IFNA1, CD83, IL6, IL7, SPANXB1, BGN, PML, PDCD1LG2, FAIM3</td>
</tr>
<tr>
<td align="left">Rheumatoid arthritis</td>
<td align="left">70</td>
<td align="left">SELE, OSM, DEFA1, TPSAB1, ADORA3, IL15, IL16, TNFRSF9, IL17A, MAL, MGAT5, CD5, CSF1, KIR2DL1, TIA1, CD14, HPSE, HLA-C, FCGR3A, SELP, ITGA4, HRH4, CD80, CD86, TNFRSF8, ACP5, ICAM1, ICAM3, IL21, MITF, CCL18, IL1A, LAMP3, CXCL13, CD274, MDK, CCRL2, ICOS, CCL3, IRF3, CCR2, CCL5, LTA, CCL3L1, P2RX7, CCL11, CCL13, TLR2, C5AR1, HAMP, GCNT1, CXCL16, CHI3L1, LTB4R2, TNFRSF17, IL1B, IL2, XCL1, CX3CL1, ITGB2, CCL20, IL4, CD276, CD83, CXCL12, IL7, TNF, PML, PDCD1LG2, LGALS9</td>
</tr>
<tr>
<td align="left">Prostate cancer</td>
<td align="left">59</td>
<td align="left">ARG2, IFNB1, A2M, IL10RA, MAGEA1, MAL, MAGEA4, S100A9, CXCL10, IFNG, IL15, IL16, IL18, TES, MGAT5, CSF1, CALCA, CALCR, HLA-A, ASAH1, MPO, SEC62, AGER, IL10, AZGP1, ITGA5, ICAM1, TLR1, TLR3, MCAM, CCL2, CD55, STEAP2, B2M, CCR1, CCR2, CCL5, CCR9, TNF, CTLA4, CRP, ITGA2, GCNT1, CXCL16, CHI3L1, TLR6, CHIT1, S100A8, OSM, LCT, IL1RN, IL2, CSMD1, IL4, IL6, ACPP, PML, PTMA, RING1</td>
</tr>
<tr>
<td align="left">Diabetes mellitus</td>
<td align="left">57</td>
<td align="left">MPO, IFNG, DEFA1, SELE, EPO, IL13, SELL, IL15, SELP, CD4, ITGA2B, CSF3, HLA-A, LCAT, HP, CD86, AGER, GLP1R, ICAM1, TLR3, IL21, P2RX7, CMA1, MDK, MCAM, CD55, HRH4, CCL2, CASQ1, CCR2, CCL5, LTA, ALAD, GNAI2, TNF, CTLA4, GGT2, ITGA2, GCNT1, KIR2DL2, IL1A, ITGAM, HPSE, ITGB2, TTR, IL4, IFNA1, MEF2C, PCK1, CXCL12, CD163, LGALS3, BGLAP, CRP, MC3R, TNFRSF4, APOC1</td>
</tr>
<tr>
<td align="left">Cancer</td>
<td align="left">55</td>
<td align="left">MAGEA3, CCNT1, EPOR, IL13RA2, AMPH, SERPINB4, CEACAM5, KITLG, GALNT3, FCER2, ANPEP, MS4A1, SPN, PDZK1IP1, NCR2, CD99, AFP, EPO, CD34, THY1, CAPG, CYP27A1, VTCN1, TIA1, C1QBP, CEACAM6, CXCL14, ST3GAL6, EBAG9, HPSE2, CCR3, ST3GAL4, PAX5, ATOH1, STIL, BCL6, CASC5, MDK, PBX2, CTSE, MUC2, SLAMF1, ST18, IL3, HPSE, MUC6, HNRNPF, CXCL12, LGALS1, LGALS3, SLC3A2, CD200, CEACAM1, TPD52, FGFBP1</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab9">
<label>Table 9</label>
<caption>
<p>Gene-disease associations from gene-year and gene-country analysis</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="2">Gene</th>
<th align="left" colspan="2">Disease associations for gene</th>
<th align="left" colspan="2">Genes that share more diseases with this gene</th>
<th align="left" colspan="2">Country associations for gene</th>
</tr>
<tr>
<th align="left">Disease name</th>
<th align="left">Score</th>
<th align="left">Gene name</th>
<th align="left"># of shared diseases</th>
<th align="left">Country name</th>
<th align="left"># of abstracts</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="10">ERBB2</td>
<td align="left">Breast neoplasms</td>
<td align="left">0.414</td>
<td align="left">EGFR</td>
<td char="." align="left">15</td>
<td align="left">United States</td>
<td char="." align="char">4271</td>
</tr>
<tr>
<td align="left">Mammary neoplasms, experimental</td>
<td align="left">0.4</td>
<td align="left">PTGS2</td>
<td char="." align="left">13</td>
<td align="left">Italy</td>
<td char="." align="char">808</td>
</tr>
<tr>
<td align="left">Neoplasm metastasis</td>
<td align="left">0.396</td>
<td align="left">SOD2</td>
<td char="." align="left">12</td>
<td align="left">Japan</td>
<td char="." align="char">806</td>
</tr>
<tr>
<td align="left">Adenocarcinoma</td>
<td align="left">0.363</td>
<td align="left">TP53</td>
<td char="." align="left">11</td>
<td align="left">China</td>
<td char="." align="char">799</td>
</tr>
<tr>
<td align="left">Ovarian neoplasms</td>
<td align="left">0.331</td>
<td align="left">STAT3</td>
<td char="." align="left">10</td>
<td align="left">United Kingdom</td>
<td char="." align="char">674</td>
</tr>
<tr>
<td align="left">Prostatic neoplasms</td>
<td align="left">0.329</td>
<td align="left">CCND1</td>
<td char="." align="left">10</td>
<td align="left">Germany</td>
<td char="." align="char">620</td>
</tr>
<tr>
<td align="left">Lung neoplasms</td>
<td align="left">0.329</td>
<td align="left">ESR1</td>
<td char="." align="left">10</td>
<td align="left">France</td>
<td char="." align="char">486</td>
</tr>
<tr>
<td align="left">Stomach neoplasms</td>
<td align="left">0.321</td>
<td align="left">KRAS</td>
<td char="." align="left">9</td>
<td align="left">Canada</td>
<td char="." align="char">433</td>
</tr>
<tr>
<td align="left">Cholangiocarcinoma</td>
<td align="left">0.308</td>
<td align="left">TNF</td>
<td char="." align="left">9</td>
<td align="left">South Korea</td>
<td char="." align="char">347</td>
</tr>
<tr>
<td align="left">Glioma</td>
<td align="left">0.306</td>
<td align="left">TNFSF10</td>
<td char="." align="left">9</td>
<td align="left">Spain</td>
<td char="." align="char">329</td>
</tr>
<tr>
<td align="left" rowspan="10">ESR1</td>
<td align="left">Breast neoplasms</td>
<td align="left">0.423</td>
<td align="left">SOD2</td>
<td char="." align="left">14</td>
<td align="left">United States</td>
<td char="." align="char">5429</td>
</tr>
<tr>
<td align="left">Alzheimer disease</td>
<td align="left">0.358</td>
<td align="left">EGFR</td>
<td char="." align="left">13</td>
<td align="left">United Kingdom</td>
<td char="." align="char">1249</td>
</tr>
<tr>
<td align="left">Neoplasm metastasis</td>
<td align="left">0.345</td>
<td align="left">PTGS2</td>
<td char="." align="left">12</td>
<td align="left">Japan</td>
<td char="." align="char">918</td>
</tr>
<tr>
<td align="left">Carcinoma</td>
<td align="left">0.344</td>
<td align="left">TNF</td>
<td char="." align="left">11</td>
<td align="left">China</td>
<td char="." align="char">764</td>
</tr>
<tr>
<td align="left">Coronary artery disease</td>
<td align="left">0.342</td>
<td align="left">CDH1</td>
<td char="." align="left">10</td>
<td align="left">Italy</td>
<td char="." align="char">727</td>
</tr>
<tr>
<td align="left">Migraine disorders</td>
<td align="left">0.333</td>
<td align="left">ACE</td>
<td align="left">10</td>
<td align="left">France</td>
<td align="left">569</td>
</tr>
<tr>
<td align="left">Obesity</td>
<td align="left">0.327</td>
<td align="left">ERBB2</td>
<td char="." align="left">10</td>
<td align="left">Germany</td>
<td char="." align="char">517</td>
</tr>
<tr>
<td align="left">Leiomyoma</td>
<td align="left">0.327</td>
<td align="left">PTEN</td>
<td char="." align="left">9</td>
<td align="left">Canada</td>
<td char="." align="char">515</td>
</tr>
<tr>
<td align="left">Myocardial infarction</td>
<td align="left">0.323</td>
<td align="left">STAT3</td>
<td char="." align="left">9</td>
<td align="left">South Korea</td>
<td char="." align="char">338</td>
</tr>
<tr>
<td align="left">Infertility, male</td>
<td align="left">0.321</td>
<td align="left">TP53</td>
<td char="." align="left">9</td>
<td align="left">Sweden</td>
<td char="." align="char">299</td>
</tr>
<tr>
<td align="left" rowspan="10">PGR</td>
<td align="left">Breast neoplasms</td>
<td align="left">0.38</td>
<td align="left">EGFR</td>
<td char="." align="left">7</td>
<td align="left">United States</td>
<td char="." align="char">1887</td>
</tr>
<tr>
<td align="left">Endometriosis</td>
<td align="left">0.346</td>
<td align="left">ESR1</td>
<td align="left">6</td>
<td align="left">Japan</td>
<td char="." align="char">456</td>
</tr>
<tr>
<td align="left">Carcinoma</td>
<td align="left">0.32</td>
<td align="left">ESR2</td>
<td align="left">6</td>
<td align="left">Italy</td>
<td char="." align="char">404</td>
</tr>
<tr>
<td align="left">Meningioma</td>
<td align="left">0.307</td>
<td align="left">STAT3</td>
<td align="left">5</td>
<td align="left">China</td>
<td char="." align="char">385</td>
</tr>
<tr>
<td align="left">Adenocarcinoma</td>
<td align="left">0.304</td>
<td align="left">EFEMP1</td>
<td align="left">5</td>
<td align="left">United Kingdom</td>
<td char="." align="char">311</td>
</tr>
<tr>
<td align="left">Mammary neoplasms, animal</td>
<td align="left">0.3</td>
<td align="left">CDH1</td>
<td align="left">5</td>
<td align="left">France</td>
<td char="." align="char">294</td>
</tr>
<tr>
<td align="left">Mammary neoplasms, experimental</td>
<td align="left">0.3</td>
<td align="left">PHB</td>
<td align="left">5</td>
<td align="left">Germany</td>
<td char="." align="char">245</td>
</tr>
<tr>
<td align="left">Mesothelioma</td>
<td align="left">0.3</td>
<td align="left">PDGFA</td>
<td align="left">5</td>
<td align="left">Canada</td>
<td char="." align="char">188</td>
</tr>
<tr>
<td align="left">Recurrence</td>
<td align="left">0.3</td>
<td align="left">STAT5A</td>
<td align="left">5</td>
<td align="left">South Korea</td>
<td char="." align="char">169</td>
</tr>
<tr>
<td align="left">Malignant neoplasm breast</td>
<td align="left">0.126</td>
<td align="left">ENO1</td>
<td align="left">5</td>
<td align="left">Sweden</td>
<td char="." align="char">131</td>
</tr>
<tr>
<td align="left" rowspan="10">EGF</td>
<td align="left">Hypomagnesemia 4, renal</td>
<td align="left">0.6</td>
<td align="left">SOD2</td>
<td align="left">11</td>
<td align="left">United States</td>
<td char="." align="char">2199</td>
</tr>
<tr>
<td align="left">Wounds and injuries</td>
<td align="left">0.4</td>
<td align="left">IL6</td>
<td align="left">9</td>
<td align="left">United Kingdom</td>
<td char="." align="char">408</td>
</tr>
<tr>
<td align="left">Breast neoplasms</td>
<td align="left">0.325</td>
<td align="left">MMP9</td>
<td align="left">9</td>
<td align="left">Japan</td>
<td char="." align="char">394</td>
</tr>
<tr>
<td align="left">Prostatic neoplasms</td>
<td align="left">0.322</td>
<td align="left">PTGS2</td>
<td align="left">9</td>
<td align="left">China</td>
<td char="." align="char">378</td>
</tr>
<tr>
<td align="left">Carcinoma, hepatocellular</td>
<td align="left">0.317</td>
<td align="left">TNF</td>
<td align="left">9</td>
<td align="left">Italy</td>
<td char="." align="char">298</td>
</tr>
<tr>
<td align="left">Neoplasm metastasis</td>
<td align="left">0.315</td>
<td align="left">PTEN</td>
<td align="left">8</td>
<td align="left">Germany</td>
<td char="." align="char">239</td>
</tr>
<tr>
<td align="left">Glioblastoma</td>
<td align="left">0.311</td>
<td align="left">EGFR</td>
<td align="left">8</td>
<td align="left">South Korea</td>
<td char="." align="char">211</td>
</tr>
<tr>
<td align="left">Adenocarcinoma</td>
<td align="left">0.307</td>
<td align="left">IGF1</td>
<td align="left">8</td>
<td align="left">Canada</td>
<td char="." align="char">205</td>
</tr>
<tr>
<td align="left">Kidney diseases</td>
<td align="left">0.301</td>
<td align="left">IL8</td>
<td align="left">8</td>
<td align="left">France</td>
<td char="." align="char">173</td>
</tr>
<tr>
<td align="left">Stomach ulcer</td>
<td align="left">0.3</td>
<td align="left">TGFB1</td>
<td align="left">7</td>
<td align="left">Spain</td>
<td char="." align="char">112</td>
</tr>
<tr>
<td align="left" rowspan="10">BRCA1</td>
<td align="left">Breast-ovarian cancer, familial, Susceptibility To, 1</td>
<td align="left">0.7</td>
<td align="left">CDH1</td>
<td align="left">7</td>
<td align="left">United States</td>
<td char="." align="char">1845</td>
</tr>
<tr>
<td align="left">Malignant neoplasm breast</td>
<td align="left">0.54</td>
<td align="left">CCND1</td>
<td align="left">7</td>
<td align="left">United Kingdom</td>
<td char="." align="char">395</td>
</tr>
<tr>
<td align="left">Malignant neoplasm of ovary</td>
<td align="left">0.44</td>
<td align="left">SOD2</td>
<td align="left">7</td>
<td align="left">Canada</td>
<td char="." align="char">304</td>
</tr>
<tr>
<td align="left">Breast neoplasms</td>
<td align="left">0.419</td>
<td align="left">BRCA2</td>
<td align="left">6</td>
<td align="left">France</td>
<td char="." align="char">222</td>
</tr>
<tr>
<td align="left">Mammary neoplasms, experimental</td>
<td align="left">0.4</td>
<td align="left">HRAS</td>
<td align="left">6</td>
<td align="left">The Netherlands</td>
<td char="." align="char">218</td>
</tr>
<tr>
<td align="left">Ovarian neoplasms</td>
<td align="left">0.381</td>
<td align="left">STAT3</td>
<td align="left">6</td>
<td align="left">Italy</td>
<td char="." align="char">197</td>
</tr>
<tr>
<td align="left">Neoplasms</td>
<td align="left">0.375</td>
<td align="left">EGFR</td>
<td align="left">6</td>
<td align="left">China</td>
<td char="." align="char">182</td>
</tr>
<tr>
<td align="left">Carcinoma</td>
<td align="left">0.366</td>
<td align="left">ERBB2</td>
<td align="left">6</td>
<td align="left">Spain</td>
<td char="." align="char">143</td>
</tr>
<tr>
<td align="left">Hereditary breast and ovarian cancer Syndrome</td>
<td align="left">0.359</td>
<td align="left">ESR1</td>
<td align="left">6</td>
<td align="left">Germany</td>
<td char="." align="char">140</td>
</tr>
<tr>
<td align="left">Prostatic neoplasms</td>
<td align="left">0.318</td>
<td align="left">AKT1</td>
<td align="left">5</td>
<td align="left">Japan</td>
<td char="." align="char">124</td>
</tr>
<tr>
<td align="left" rowspan="10">BRCA2</td>
<td align="left">Fanconi anemia, complementation Group D1</td>
<td align="left">0.7</td>
<td align="left">BRCA1</td>
<td align="left">6</td>
<td align="left">United States</td>
<td char="." align="char">885</td>
</tr>
<tr>
<td align="left">Malignant neoplasm breast</td>
<td align="left">0.54</td>
<td align="left">CTNNB1</td>
<td align="left">6</td>
<td align="left">United kingdom</td>
<td char="." align="char">256</td>
</tr>
<tr>
<td align="left">Ovarian neoplasms</td>
<td align="left">0.464</td>
<td align="left">ERBB2</td>
<td align="left">6</td>
<td align="left">Canada</td>
<td char="." align="char">203</td>
</tr>
<tr>
<td align="left">Prostatic neoplasms</td>
<td align="left">0.409</td>
<td align="left">PTEN</td>
<td align="left">5</td>
<td align="left">Italy</td>
<td char="." align="char">119</td>
</tr>
<tr>
<td align="left">Medulloblastoma</td>
<td align="left">0.401</td>
<td align="left">SOD2</td>
<td align="left">5</td>
<td align="left">The Netherlands</td>
<td char="." align="char">115</td>
</tr>
<tr>
<td align="left">Breast neoplasms</td>
<td align="left">0.392</td>
<td align="left">TNF</td>
<td align="left">5</td>
<td align="left">Germany</td>
<td char="." align="char">102</td>
</tr>
<tr>
<td align="left">Hereditary breast and ovarian cancer Syndrome</td>
<td align="left">0.334</td>
<td align="left">TNFSF10</td>
<td align="left">5</td>
<td align="left">France</td>
<td char="." align="char">100</td>
</tr>
<tr>
<td align="left">Fanconi ANEMIA</td>
<td align="left">0.326</td>
<td align="left">AKT1</td>
<td align="left">4</td>
<td align="left">Spain</td>
<td char="." align="char">93</td>
</tr>
<tr>
<td align="left">Pancreatic neoplasms</td>
<td align="left">0.309</td>
<td align="left">BRIP1</td>
<td align="left">4</td>
<td align="left">Australia</td>
<td char="." align="char">73</td>
</tr>
<tr>
<td align="left">Wilms tumor</td>
<td align="left">0.3</td>
<td align="left">CDH1</td>
<td align="left">4</td>
<td align="left">Israel</td>
<td char="." align="char">70</td>
</tr>
<tr>
<td align="left" rowspan="10">CDKN2A</td>
<td align="left">Melanoma-pancreatic cancer syndrome</td>
<td align="left">0.6</td>
<td align="left">TP53</td>
<td align="left">15</td>
<td align="left">United States</td>
<td char="." align="char">1809</td>
</tr>
<tr>
<td align="left">Melanoma, cutaneous malignant, susceptibility To, 2</td>
<td align="left">0.6</td>
<td align="left">SOD2</td>
<td align="left">12</td>
<td align="left">China</td>
<td char="." align="char">431</td>
</tr>
<tr>
<td align="left">Lung neoplasms</td>
<td align="left">0.442</td>
<td align="left">KRAS</td>
<td align="left">9</td>
<td align="left">Japan</td>
<td char="." align="char">340</td>
</tr>
<tr>
<td align="left">Stomach neoplasms</td>
<td align="left">0.411</td>
<td align="left">PTGS2</td>
<td align="left">9</td>
<td align="left">United Kingdom</td>
<td char="." align="char">325</td>
</tr>
<tr>
<td align="left">Esophageal neoplasms</td>
<td align="left">0.41</td>
<td align="left">ABCB1</td>
<td align="left">7</td>
<td align="left">Italy</td>
<td char="." align="char">297</td>
</tr>
<tr>
<td align="left">Neoplasms</td>
<td align="left">0.391</td>
<td align="left">CSF3</td>
<td align="left">7</td>
<td align="left">France</td>
<td char="." align="char">224</td>
</tr>
<tr>
<td align="left">Adenocarcinoma</td>
<td align="left">0.358</td>
<td align="left">EGFR</td>
<td align="left">7</td>
<td align="left">Germany</td>
<td char="." align="char">218</td>
</tr>
<tr>
<td align="left">Glioma</td>
<td align="left">0.341</td>
<td align="left">ESR1</td>
<td align="left">6</td>
<td align="left">South Korea</td>
<td char="." align="char">202</td>
</tr>
<tr>
<td align="left">Precursor cell lymphoblastic leukemia-Lymphoma</td>
<td align="left">0.338</td>
<td align="left">MET</td>
<td align="left">6</td>
<td align="left">Canada</td>
<td char="." align="char">186</td>
</tr>
<tr>
<td align="left">Carcinoma, non-small-cell lung</td>
<td align="left">0.332</td>
<td align="left">ERBB2</td>
<td align="left">6</td>
<td align="left">Spain</td>
<td char="." align="char">136</td>
</tr>
<tr>
<td align="left" rowspan="10">ALPPL2</td>
<td align="left">Abortion, spontaneous</td>
<td align="left">0.3</td>
<td align="left">CEACAM1</td>
<td align="left">1</td>
<td align="left">United States</td>
<td char="." align="char">104</td>
</tr>
<tr>
<td align="left">Parkinson disease</td>
<td align="left">0.003</td>
<td align="left">HSD17B1</td>
<td align="left">1</td>
<td align="left">Germany</td>
<td char="." align="char">32</td>
</tr>
<tr>
<td align="left">Stroke</td>
<td align="left">0.003</td>
<td align="left">IFI35</td>
<td align="left">1</td>
<td align="left">United Kingdom</td>
<td char="." align="char">30</td>
</tr>
<tr>
<td align="left">Carcinoma in situ</td>
<td align="left">0.001</td>
<td align="left">IFI44</td>
<td align="left">1</td>
<td align="left">Italy</td>
<td char="." align="char">28</td>
</tr>
<tr>
<td align="left">Seminoma</td>
<td align="left">0.001</td>
<td align="left">IFI6</td>
<td align="left">1</td>
<td align="left">France</td>
<td char="." align="char">18</td>
</tr>
<tr>
<td align="left">Retinal diseases</td>
<td align="left"><0.001</td>
<td align="left">IFNA10</td>
<td align="left">1</td>
<td align="left">Japan</td>
<td char="." align="char">16</td>
</tr>
<tr>
<td align="left">Embryonal neoplasm</td>
<td align="left"><0.001</td>
<td align="left">IGFBP1</td>
<td align="left">1</td>
<td align="left">Canada</td>
<td char="." align="char">13</td>
</tr>
<tr>
<td align="left">Carcinoma, embryonal</td>
<td align="left"><0.001</td>
<td align="left">IGFBP6</td>
<td align="left">1</td>
<td align="left">Greece</td>
<td char="." align="char">11</td>
</tr>
<tr>
<td align="left"></td>
<td align="left"></td>
<td align="left">IL11</td>
<td align="left">1</td>
<td align="left">The Netherlands</td>
<td char="." align="char">9</td>
</tr>
<tr>
<td align="left"></td>
<td align="left"></td>
<td align="left">IL12B</td>
<td align="left">1</td>
<td align="left">China</td>
<td char="." align="char">9</td>
</tr>
<tr>
<td align="left" rowspan="10">CD99</td>
<td align="left">Chondrosarcoma, mesenchymal</td>
<td align="left">0.3</td>
<td align="left">PDGFRA</td>
<td align="left">1</td>
<td align="left">United States</td>
<td char="." align="char">342</td>
</tr>
<tr>
<td align="left">Neuroectodermal tumors, primitive, Peripheral</td>
<td align="left">0.012</td>
<td align="left">BCL2</td>
<td align="left">1</td>
<td align="left">Japan</td>
<td char="." align="char">77</td>
</tr>
<tr>
<td align="left">Sarcoma, ewing</td>
<td align="left">0.01</td>
<td align="left">IL1A</td>
<td align="left">1</td>
<td align="left">Germany</td>
<td char="." align="char">74</td>
</tr>
<tr>
<td align="left">Breast neoplasms</td>
<td align="left">0.005</td>
<td align="left">MKI67</td>
<td align="left">1</td>
<td align="left">Italy</td>
<td char="." align="char">63</td>
</tr>
<tr>
<td align="left">Carcinoma</td>
<td align="left">0.005</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">China</td>
<td char="." align="char">56</td>
</tr>
<tr>
<td align="left">Neuroectodermal tumors, primitive</td>
<td align="left">0.004</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">United Kingdom</td>
<td char="." align="char">52</td>
</tr>
<tr>
<td align="left">Osteosarcoma</td>
<td align="left">0.003</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">Canada</td>
<td char="." align="char">37</td>
</tr>
<tr>
<td align="left">Neoplasms</td>
<td align="left">0.003</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">France</td>
<td char="." align="char">26</td>
</tr>
<tr>
<td align="left">Lymphoma</td>
<td align="left">0.003</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">Australia</td>
<td char="." align="char">24</td>
</tr>
<tr>
<td align="left">Adenocarcinoma</td>
<td align="left">0.003</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">The Netherlands</td>
<td char="." align="char">22</td>
</tr>
<tr>
<td align="left" rowspan="10">CHI3L1</td>
<td align="left">Schizophrenia</td>
<td align="left">0.319</td>
<td align="left">TNF</td>
<td align="left">3</td>
<td align="left">United States</td>
<td char="." align="char">90</td>
</tr>
<tr>
<td align="left">Glioblastoma</td>
<td align="left">0.311</td>
<td align="left">MET</td>
<td align="left">3</td>
<td align="left">Japan</td>
<td char="." align="char">17</td>
</tr>
<tr>
<td align="left">Glioma</td>
<td align="left">0.31</td>
<td align="left">MGMT</td>
<td align="left">2</td>
<td align="left">United Kingdom</td>
<td char="." align="char">17</td>
</tr>
<tr>
<td align="left">Neoplasm invasiveness</td>
<td align="left">0.303</td>
<td align="left">TGM2</td>
<td align="left">2</td>
<td align="left">Italy</td>
<td char="." align="char">15</td>
</tr>
<tr>
<td align="left">Osteoarthritis</td>
<td align="left">0.301</td>
<td align="left">ACO1</td>
<td align="left">2</td>
<td align="left">France</td>
<td char="." align="char">12</td>
</tr>
<tr>
<td align="left">Asthma-related traits, susceptibility To, 7</td>
<td align="left">0.3</td>
<td align="left">MMP9</td>
<td align="left">2</td>
<td align="left">Denmark</td>
<td char="." align="char">10</td>
</tr>
<tr>
<td align="left">Hypertension</td>
<td align="left">0.103</td>
<td align="left">GDNF</td>
<td align="left">2</td>
<td align="left">Germany</td>
<td char="." align="char">6</td>
</tr>
<tr>
<td align="left">Asthma</td>
<td align="left">0.017</td>
<td align="left">FTL</td>
<td align="left">2</td>
<td align="left">Australia</td>
<td char="." align="char">6</td>
</tr>
<tr>
<td align="left">Arthritis, rheumatoid</td>
<td align="left">0.009</td>
<td align="left">ENO1</td>
<td align="left">2</td>
<td align="left">Finland</td>
<td char="." align="char">5</td>
</tr>
<tr>
<td align="left">Neoplasm malignant</td>
<td align="left">0.005</td>
<td align="left">EGF</td>
<td align="left">2</td>
<td align="left">India</td>
<td char="." align="char">5</td>
</tr>
<tr>
<td align="left" rowspan="10">SOD1</td>
<td align="left">Amyotrophic lateral sclerosis 1</td>
<td align="left">0.66</td>
<td align="left">TNF</td>
<td align="left">19</td>
<td align="left">United States</td>
<td char="." align="char">265</td>
</tr>
<tr>
<td align="left">Amyotrophic lateral sclerosis</td>
<td align="left">0.551</td>
<td align="left">SOD2</td>
<td align="left">17</td>
<td align="left">Italy</td>
<td char="." align="char">42</td>
</tr>
<tr>
<td align="left">Hypertension</td>
<td align="left">0.402</td>
<td align="left">IL6</td>
<td align="left">15</td>
<td align="left">Japan</td>
<td char="." align="char">40</td>
</tr>
<tr>
<td align="left">Deficiency diseases</td>
<td align="left">0.4</td>
<td align="left">PTGS2</td>
<td align="left">14</td>
<td align="left">India</td>
<td char="." align="char">39</td>
</tr>
<tr>
<td align="left">Motor neuron disease</td>
<td align="left">0.341</td>
<td align="left">NOS2</td>
<td align="left">13</td>
<td align="left">China</td>
<td char="." align="char">31</td>
</tr>
<tr>
<td align="left">Down syndrome</td>
<td align="left">0.323</td>
<td align="left">CAT</td>
<td align="left">13</td>
<td align="left">United Kingdom</td>
<td char="." align="char">28</td>
</tr>
<tr>
<td align="left">Atherosclerosis</td>
<td align="left">0.31</td>
<td align="left">AGT</td>
<td align="left">11</td>
<td align="left">Germany</td>
<td char="." align="char">26</td>
</tr>
<tr>
<td align="left">Diabetes mellitus, type 2</td>
<td align="left">0.31</td>
<td align="left">IL1B</td>
<td align="left">10</td>
<td align="left">The Netherlands</td>
<td char="." align="char">25</td>
</tr>
<tr>
<td align="left">Ischemia</td>
<td align="left">0.309</td>
<td align="left">IFNG</td>
<td align="left">10</td>
<td align="left">Turkey</td>
<td char="." align="char">20</td>
</tr>
<tr>
<td align="left">Parkinson disease</td>
<td align="left">0.309</td>
<td align="left">ALB</td>
<td align="left">10</td>
<td align="left">Canada</td>
<td char="." align="char">18</td>
</tr>
<tr>
<td align="left" rowspan="10">AMN</td>
<td align="left">Imerslund-grasbeck syndrome</td>
<td align="left">0.601</td>
<td align="left">TNF</td>
<td align="left">3</td>
<td align="left">United States</td>
<td char="." align="char">138</td>
</tr>
<tr>
<td align="left">Acute kidney injury</td>
<td align="left">0.3</td>
<td align="left">KNG1</td>
<td align="left">3</td>
<td align="left">United Kingdom</td>
<td char="." align="char">32</td>
</tr>
<tr>
<td align="left">Neurogenic inflammation</td>
<td align="left">0.3</td>
<td align="left">TAC1</td>
<td align="left">3</td>
<td align="left">Germany</td>
<td char="." align="char">22</td>
</tr>
<tr>
<td align="left">Edema</td>
<td align="left">0.3</td>
<td align="left">IL6</td>
<td align="left">2</td>
<td align="left">Japan</td>
<td char="." align="char">18</td>
</tr>
<tr>
<td align="left">Extravasation of diagnostic and Therapeutic Materials</td>
<td align="left">0.3</td>
<td align="left">POMC</td>
<td align="left">2</td>
<td align="left">China</td>
<td char="." align="char">16</td>
</tr>
<tr>
<td align="left">anemia, megaloblastic</td>
<td align="left">0.003</td>
<td align="left">CALCA</td>
<td align="left">2</td>
<td align="left">Canada</td>
<td char="." align="char">12</td>
</tr>
<tr>
<td align="left">adrenoleukodystrophy</td>
<td align="left">0.003</td>
<td align="left">PTGS2</td>
<td align="left">2</td>
<td align="left">France</td>
<td char="." align="char">10</td>
</tr>
<tr>
<td align="left">Nervous system malformations</td>
<td align="left">0.003</td>
<td align="left">INS</td>
<td align="left">2</td>
<td align="left">Italy</td>
<td char="." align="char">9</td>
</tr>
<tr>
<td align="left">Malabsorption syndromes</td>
<td align="left">0.001</td>
<td align="left">KLK1</td>
<td align="left">1</td>
<td align="left">The Netherlands</td>
<td char="." align="char">9</td>
</tr>
<tr>
<td align="left">Adrenomyeloneuropathy</td>
<td align="left"><0.001</td>
<td align="left">LCN2</td>
<td align="left">1</td>
<td align="left">Australia</td>
<td char="." align="char">7</td>
</tr>
<tr>
<td align="left" rowspan="10">CD40LG</td>
<td align="left">Hyper-igm immunodeficiency syndrome, Type 1</td>
<td align="left">0.629</td>
<td align="left">CCL2</td>
<td align="left">4</td>
<td align="left">United States</td>
<td char="." align="char">66</td>
</tr>
<tr>
<td align="left">Coronary artery disease</td>
<td align="left">0.306</td>
<td align="left">IL1B</td>
<td align="left">3</td>
<td align="left">Germany</td>
<td char="." align="char">15</td>
</tr>
<tr>
<td align="left">Pneumonia</td>
<td align="left">0.3</td>
<td align="left">IL6</td>
<td align="left">3</td>
<td align="left">United Kingdom</td>
<td char="." align="char">12</td>
</tr>
<tr>
<td align="left">Amyotrophic lateral sclerosis</td>
<td align="left">0.3</td>
<td align="left">TNF</td>
<td align="left">3</td>
<td align="left">Japan</td>
<td char="." align="char">11</td>
</tr>
<tr>
<td align="left">Hypersensitivity</td>
<td align="left">0.3</td>
<td align="left">IL8</td>
<td align="left">3</td>
<td align="left">Italy</td>
<td char="." align="char">10</td>
</tr>
<tr>
<td align="left">Necrosis</td>
<td align="left">0.3</td>
<td align="left">IFNG</td>
<td align="left">3</td>
<td align="left">Argentina</td>
<td char="." align="char">9</td>
</tr>
<tr>
<td align="left">Hypertension, pulmonary</td>
<td align="left">0.3</td>
<td align="left">IL5RA</td>
<td align="left">2</td>
<td align="left">China</td>
<td char="." align="char">9</td>
</tr>
<tr>
<td align="left">Diabetes mellitus, type 1</td>
<td align="left">0.101</td>
<td align="left">HMOX1</td>
<td align="left">2</td>
<td align="left">Australia</td>
<td char="." align="char">7</td>
</tr>
<tr>
<td align="left">Enterocolitis, necrotizing</td>
<td align="left">0.1</td>
<td align="left">IL13</td>
<td align="left">2</td>
<td align="left">Denmark</td>
<td char="." align="char">6</td>
</tr>
<tr>
<td align="left">Periodontal diseases</td>
<td align="left">0.1</td>
<td align="left">IL17A</td>
<td align="left">2</td>
<td align="left">The Netherlands</td>
<td char="." align="char">6</td>
</tr>
<tr>
<td align="left" rowspan="10">CD79A</td>
<td align="left">Agammaglobulinemia</td>
<td align="left">0.3</td>
<td align="left">BTK</td>
<td align="left">1</td>
<td align="left">United States</td>
<td char="." align="char">22</td>
</tr>
<tr>
<td align="left">Leukemia, lymphocytic, chronic, B Cell</td>
<td align="left">0.003</td>
<td align="left">CD19</td>
<td align="left">1</td>
<td align="left">France</td>
<td char="." align="char">8</td>
</tr>
<tr>
<td align="left">Lymphoma, non-hodgkin</td>
<td align="left">0.003</td>
<td align="left">IGLL1</td>
<td align="left">1</td>
<td align="left">China</td>
<td char="." align="char">7</td>
</tr>
<tr>
<td align="left">Lymphoma, B-cell</td>
<td align="left">0.003</td>
<td align="left">LRRC8A</td>
<td align="left">1</td>
<td align="left">Japan</td>
<td char="." align="char">6</td>
</tr>
<tr>
<td align="left">Leukemia, myeloid, acute</td>
<td align="left">0.003</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">India</td>
<td char="." align="char">4</td>
</tr>
<tr>
<td align="left">Leukemia</td>
<td align="left">0.003</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">Spain</td>
<td char="." align="char">4</td>
</tr>
<tr>
<td align="left">Multiple myeloma</td>
<td align="left">0.003</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">Sweden</td>
<td char="." align="char">3</td>
</tr>
<tr>
<td align="left">Lymphoma</td>
<td align="left"><0.001</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">Belgium</td>
<td char="." align="char">3</td>
</tr>
<tr>
<td align="left">Takayasu arteritis</td>
<td align="left"><0.001</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">Finland</td>
<td char="." align="char">3</td>
</tr>
<tr>
<td align="left">Lymphoma, large B-Cell, diffuse</td>
<td align="left"><0.001</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">United Kingdom</td>
<td char="." align="char">3</td>
</tr>
<tr>
<td align="left" rowspan="10">PRL</td>
<td align="left">Prolactinoma</td>
<td align="left">0.415</td>
<td align="left">DRD2</td>
<td align="left">9</td>
<td align="left">United States</td>
<td char="." align="char">294</td>
</tr>
<tr>
<td align="left">Hyperprolactinemia</td>
<td align="left">0.412</td>
<td align="left">POMC</td>
<td align="left">8</td>
<td align="left">United Kingdom</td>
<td char="." align="char">59</td>
</tr>
<tr>
<td align="left">Adenoma</td>
<td align="left">0.33</td>
<td align="left">IL6</td>
<td align="left">6</td>
<td align="left">Italy</td>
<td char="." align="char">55</td>
</tr>
<tr>
<td align="left">Lupus erythematosus, systemic</td>
<td align="left">0.325</td>
<td align="left">CYP19A1</td>
<td align="left">6</td>
<td align="left">Canada</td>
<td char="." align="char">40</td>
</tr>
<tr>
<td align="left">Pituitary neoplasms</td>
<td align="left">0.311</td>
<td align="left">TNF</td>
<td align="left">6</td>
<td align="left">France</td>
<td char="." align="char">31</td>
</tr>
<tr>
<td align="left">Autistic disorder</td>
<td align="left">0.304</td>
<td align="left">ESR2</td>
<td align="left">5</td>
<td align="left">Australia</td>
<td char="." align="char">29</td>
</tr>
<tr>
<td align="left">Growth hormone-secreting pituitary Adenoma</td>
<td align="left">0.302</td>
<td align="left">AGT</td>
<td align="left">5</td>
<td align="left">Japan</td>
<td char="." align="char">28</td>
</tr>
<tr>
<td align="left">Endometriosis</td>
<td align="left">0.301</td>
<td align="left">CNR1</td>
<td align="left">5</td>
<td align="left">China</td>
<td char="." align="char">23</td>
</tr>
<tr>
<td align="left">Hypopituitarism</td>
<td align="left">0.301</td>
<td align="left">CRH</td>
<td align="left">5</td>
<td align="left">Spain</td>
<td char="." align="char">20</td>
</tr>
<tr>
<td align="left">Amenorrhea</td>
<td align="left">0.301</td>
<td align="left">CYP17A1</td>
<td align="left">5</td>
<td align="left">India</td>
<td char="." align="char">19</td>
</tr>
<tr>
<td align="left" rowspan="10">AFP</td>
<td align="left">Carcinoma, hepatocellular</td>
<td align="left">0.398</td>
<td align="left">MMP9</td>
<td align="left">5</td>
<td align="left">United States</td>
<td char="." align="char">46</td>
</tr>
<tr>
<td align="left">Liver diseases</td>
<td align="left">0.303</td>
<td align="left">HMOX1</td>
<td align="left">4</td>
<td align="left">Japan</td>
<td char="." align="char">12</td>
</tr>
<tr>
<td align="left">Liver cirrhosis, experimental</td>
<td align="left">0.3</td>
<td align="left">ENO1</td>
<td align="left">3</td>
<td align="left">China</td>
<td char="." align="char">11</td>
</tr>
<tr>
<td align="left">Breast neoplasms</td>
<td align="left">0.3</td>
<td align="left">MMP2</td>
<td align="left">3</td>
<td align="left">Germany</td>
<td char="." align="char">8</td>
</tr>
<tr>
<td align="left">Mammary neoplasms, experimental</td>
<td align="left">0.3</td>
<td align="left">ESR1</td>
<td align="left">3</td>
<td align="left">Italy</td>
<td char="." align="char">7</td>
</tr>
<tr>
<td align="left">Liver neoplasms</td>
<td align="left">0.019</td>
<td align="left">HRAS</td>
<td align="left">3</td>
<td align="left">France</td>
<td char="." align="char">6</td>
</tr>
<tr>
<td align="left">Recurrent malignant neoplasm</td>
<td align="left">0.015</td>
<td align="left">NOS2</td>
<td align="left">3</td>
<td align="left">Canada</td>
<td char="." align="char">5</td>
</tr>
<tr>
<td align="left">Hepatitis B</td>
<td align="left">0.014</td>
<td align="left">IGF1</td>
<td align="left">3</td>
<td align="left">Ireland</td>
<td char="." align="char">4</td>
</tr>
<tr>
<td align="left">Neoplasm malignant</td>
<td align="left">0.012</td>
<td align="left">PTGS2</td>
<td align="left">3</td>
<td align="left">Turkey</td>
<td char="." align="char">3</td>
</tr>
<tr>
<td align="left">Down syndrome</td>
<td align="left">0.011</td>
<td align="left">TNFSF10</td>
<td align="left">3</td>
<td align="left">Singapore</td>
<td char="." align="char">3</td>
</tr>
<tr>
<td align="left" rowspan="10">POMC</td>
<td align="left">Obesity</td>
<td align="left">0.454</td>
<td align="left">TNF</td>
<td align="left">22</td>
<td align="left">United States</td>
<td char="." align="char">27</td>
</tr>
<tr>
<td align="left">Proopiomelanocortin deficiency</td>
<td align="left">0.4</td>
<td align="left">IL6</td>
<td align="left">17</td>
<td align="left">Italy</td>
<td char="." align="char">10</td>
</tr>
<tr>
<td align="left">Cushing syndrome</td>
<td align="left">0.331</td>
<td align="left">AGT</td>
<td align="left">15</td>
<td align="left">Japan</td>
<td char="." align="char">9</td>
</tr>
<tr>
<td align="left">Pituitary acth hypersecretion</td>
<td align="left">0.315</td>
<td align="left">IL1B</td>
<td align="left">15</td>
<td align="left">France</td>
<td char="." align="char">7</td>
</tr>
<tr>
<td align="left">Adrenal cortex diseases</td>
<td align="left">0.309</td>
<td align="left">PTGS2</td>
<td align="left">15</td>
<td align="left">United kingdom</td>
<td char="." align="char">6</td>
</tr>
<tr>
<td align="left">Acth syndrome, ectopic</td>
<td align="left">0.306</td>
<td align="left">SOD2</td>
<td align="left">14</td>
<td align="left">Spain</td>
<td char="." align="char">5</td>
</tr>
<tr>
<td align="left">Heart failure</td>
<td align="left">0.304</td>
<td align="left">ALB</td>
<td align="left">12</td>
<td align="left">The netherlands</td>
<td char="." align="char">5</td>
</tr>
<tr>
<td align="left">Spasms, infantile</td>
<td align="left">0.303</td>
<td align="left">INS</td>
<td align="left">12</td>
<td align="left">Germany</td>
<td char="." align="char">4</td>
</tr>
<tr>
<td align="left">Hypertension</td>
<td align="left">0.303</td>
<td align="left">BDNF</td>
<td align="left">11</td>
<td align="left">Austria</td>
<td char="." align="char">4</td>
</tr>
<tr>
<td align="left">Osteoporosis</td>
<td align="left">0.303</td>
<td align="left">CRH</td>
<td align="left">11</td>
<td align="left">Poland</td>
<td char="." align="char">4</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab10">
<label>Table 10</label>
<caption>
<p>Genes are related to diseases depends on gene-year and gene-country analysis</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left">ERBB2</th>
<th align="left">PRL</th>
<th align="left">CD79A</th>
<th align="left">CD40LG</th>
<th align="left">AMN</th>
<th align="left">SOD1</th>
<th align="left">CHI3L1</th>
<th align="left">AFP</th>
<th align="left">CD99</th>
<th align="left">CDKN2A</th>
<th align="left">BRCA2</th>
<th align="left">BRCA1</th>
<th align="left">EGF</th>
<th align="left">PGR</th>
<th align="left">ESR1</th>
<th align="left">POMC</th>
<th align="left">ALPPL2</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Breast neoplasms</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Adenocarcinoma</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Mammary neoplasms, experimental</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Carcinoma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Prostatic neoplasms</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Malignant neoplasm breast</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Glioma</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Hypertension</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Neoplasms</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Ovarian neoplasms</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Neoplasm metastasis</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Down syndrome</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Lymphoma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Glioblastoma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Carcinoma, hepatocellular</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Hereditary breast and ovarian cancer syndrome</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Endometriosis</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Parkinson disease</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Neoplasm malignant</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Obesity</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">XX</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Amyotrophic lateral sclerosis</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Coronary artery disease</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Lung neoplasms</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Stomach neoplasms</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Agammaglobulinemia</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Periodontal diseases</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Enterocolitis, necrotizing</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Diabetes mellitus, type1</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Hypertension, pulmonary</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Necrosis</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Hypersensitivity</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Imerslund-Grasbeck syndrome</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Hyper-Igm immunodeficiency syndrome, type 1</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Edema</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Adrenomyeloneuropathy</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Acute kidney injury</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Malabsorption syndromes</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Nervous system malformations</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Adrenoleukodystrophy</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Leukemia, lymphocytic, chronic, B-cell</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Neurogenic inflammation</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Extravasation of diagnostic and therapeutic materials</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Pneumonia</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Anemia, megaloblastic</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Adenoma</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Lymphoma, B-cell</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Heart failure</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Acth syndrome, ectopic</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Adrenal cortex diseases</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Pituitary acth hypersecretion</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Cushing syndrome</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">proopiomelanocortin deficiency</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Hepatitis B</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Recurrent malignant neoplasm</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Liver neoplasms</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Liver cirrhosis, experimental</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">XX</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Liver diseases</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Lymphoma, non-hodgkin</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Amenorrhea</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Growth hormone-secreting pituitary adenoma</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Autistic disorder</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Pituitary neoplasms</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Lupus erythematosus, systemic</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Hyper prolactinemia</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Prolactinoma</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Lymphoma, large B-cell, diffuse</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Takayasu Arteritis</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Multiple myeloma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Leukemia</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Leukemia, myeloid, acute</td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Hypopituitarism</td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Ischemia</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Asthma-related traits, susceptibility to, 7</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Atherosclerosis</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Wilms tumor</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Pancreatic neoplasms</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Fanconi anemia</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Medulloblastoma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Fanconi anemia, complementation group D1</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Malignant neoplasm of ovary</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Breast-ovarian cancer, familial, susceptibility to, 1</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Stomach ulcer</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Kidney diseases</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Wounds and injuries</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Hypomagnesemia 4, renal</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Recurrence</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Mesothelioma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Mammary neoplasms, animal</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Meningioma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Infertility, male</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Myocardial infarction</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Leiomyoma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Migraine disorders</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Alzheimer disease</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Cholangio carcinoma</td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Melanoma-pancreatic cancer syndrome</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Melanoma,CutaneousMalignant,SusceptibilityTo,2</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Esophageal neoplasms</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Precursor cell lymphoblastic leukemia-lymphoma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Motor neuron disease</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Deficiency diseases</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Amyotrophic lateral sclerosis1</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Arthritis, rheumatoid</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Asthma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Spasms, infantile</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Osteoarthritis</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Neoplasm invasiveness</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Schizophrenia</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Osteosarcoma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Diabetes mellitus, type 2</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Neuroectodermal tumors, primitive</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Neuroectodermal tumors, primitive, peripheral</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Chondrosarcoma, mesenchymal</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Carcinoma, embryonal</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
</tr>
<tr>
<td align="left">Embryonal neoplasm</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
</tr>
<tr>
<td align="left">Retinal diseases</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
</tr>
<tr>
<td align="left">Seminoma</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
</tr>
<tr>
<td align="left">Carcinoma in situ</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
</tr>
<tr>
<td align="left">Stroke</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
</tr>
<tr>
<td align="left">Abortion, spontaneous</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
</tr>
<tr>
<td align="left">Carcinoma, non-small-cell lung</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Sarcoma, ewing</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">Osteoporosis</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left">X</td>
<td align="left"></td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab11">
<label>Table 11</label>
<caption>
<p>Top 10 genes are mentioned by each country</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Country name</th>
<th align="left"># of abstracts</th>
<th align="left">Gene name</th>
<th align="left"></th>
<th align="left">Country name</th>
<th align="left"># of abstracts</th>
<th align="left">Gene name</th>
<th align="left"></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="10">United States</td>
<td align="left" rowspan="10">33,373</td>
<td align="left">ESR1</td>
<td align="left">5429 [16.27 %]</td>
<td align="left" rowspan="10">Germany</td>
<td align="left" rowspan="10">4148</td>
<td align="left">ERBB2</td>
<td align="left">620 [14.95 %]</td>
</tr>
<tr>
<td align="left">ERBB2</td>
<td align="left">4271 [12.8 %]</td>
<td align="left">ESR1</td>
<td align="left">517 [12.46 %]</td>
</tr>
<tr>
<td align="left">EGF</td>
<td align="left">2199 [6.59 %]</td>
<td align="left">PGR</td>
<td align="left">245 [5.91 %]</td>
</tr>
<tr>
<td align="left">PGR</td>
<td align="left">1887 [5.65 %]</td>
<td align="left">EGF</td>
<td align="left">239 [5.76 %]</td>
</tr>
<tr>
<td align="left">BRCA1</td>
<td align="left">1845 [5.53 %]</td>
<td align="left">CDKN2A</td>
<td align="left">218 [5.26 %]</td>
</tr>
<tr>
<td align="left">CDKN2A</td>
<td align="left">1809 [5.42 %]</td>
<td align="left">SLC20A2</td>
<td align="left">191 [4.6 %]</td>
</tr>
<tr>
<td align="left">SLC20A2</td>
<td align="left">1418 [4.25 %]</td>
<td align="left">BRCA1</td>
<td align="left">140 [3.38 %]</td>
</tr>
<tr>
<td align="left">TKT</td>
<td align="left">1297 [3.89 %]</td>
<td align="left">CYP19A1</td>
<td align="left">120 [2.89 %]</td>
</tr>
<tr>
<td align="left">ACAD9</td>
<td align="left">1143 [3.42 %]</td>
<td align="left">KRT75</td>
<td align="left">120 [2.89 %]</td>
</tr>
<tr>
<td align="left">CYP19A1</td>
<td align="left">1073 [3.22 %]</td>
<td align="left">TKT</td>
<td align="left">116 [2.8 %]</td>
</tr>
<tr>
<td align="left" rowspan="10">United Kingdom</td>
<td align="left" rowspan="10">6041</td>
<td align="left">ESR1</td>
<td align="left">1249 [20.68 %]</td>
<td align="left" rowspan="10">France</td>
<td align="left" rowspan="10">3642</td>
<td align="left">ESR1</td>
<td align="left">569 [15.62 %]</td>
</tr>
<tr>
<td align="left">ERBB2</td>
<td align="left">674 [11.16 %]</td>
<td align="left">ERBB2</td>
<td align="left">486 [13.34 %]</td>
</tr>
<tr>
<td align="left">CYP19A1</td>
<td align="left">425 [7.04 %]</td>
<td align="left">PGR</td>
<td align="left">294 [8.07 %]</td>
</tr>
<tr>
<td align="left">EGF</td>
<td align="left">408 [6.75 %]</td>
<td align="left">CDKN2A</td>
<td align="left">224 [6.15 %]</td>
</tr>
<tr>
<td align="left">BRCA1</td>
<td align="left">395 [6.54 %]</td>
<td align="left">BRCA1</td>
<td align="left">222 [6.1 %]</td>
</tr>
<tr>
<td align="left">CDKN2A</td>
<td align="left">325 [5.38 %]</td>
<td align="left">EGF</td>
<td align="left">173 [4.75 %]</td>
</tr>
<tr>
<td align="left">PGR</td>
<td align="left">311 [5.15 %]</td>
<td align="left">SLC20A2</td>
<td align="left">165 [4.53 %]</td>
</tr>
<tr>
<td align="left">BRCA2</td>
<td align="left">256 [4.24 %]</td>
<td align="left">TKT</td>
<td align="left">131 [3.6 %]</td>
</tr>
<tr>
<td align="left">SLC20A2</td>
<td align="left">227 [3.76 %]</td>
<td align="left">CYP19A1</td>
<td align="left">120 [3.29 %]</td>
</tr>
<tr>
<td align="left">INS</td>
<td align="left">188 [3.11 %]</td>
<td align="left">CTSD</td>
<td align="left">114 [3.13 %]</td>
</tr>
<tr>
<td align="left" rowspan="10">China</td>
<td align="left" rowspan="10">6553</td>
<td align="left">ERBB2</td>
<td align="left">799 [12.19 %]</td>
<td align="left" rowspan="10">Canada</td>
<td align="left" rowspan="10">3573</td>
<td align="left">ESR1</td>
<td align="left">515 [14.41 %]</td>
</tr>
<tr>
<td align="left">ESR1</td>
<td align="left">764 [11.66 %]</td>
<td align="left">ERBB2</td>
<td align="left">433 [12.12 %]</td>
</tr>
<tr>
<td align="left">CDKN2A</td>
<td align="left">431 [6.58 %]</td>
<td align="left">BRCA1</td>
<td align="left">304 [8.51 %]</td>
</tr>
<tr>
<td align="left">PGR</td>
<td align="left">385 [5.88 %]</td>
<td align="left">EGF</td>
<td align="left">205 [5.74 %]</td>
</tr>
<tr>
<td align="left">EGF</td>
<td align="left">378 [5.77 %]</td>
<td align="left">BRCA2</td>
<td align="left">203 [5.68 %]</td>
</tr>
<tr>
<td align="left">ACAD9</td>
<td align="left">336 [5.13 %]</td>
<td align="left">PGR</td>
<td align="left">188 [5.26 %]</td>
</tr>
<tr>
<td align="left">MYLIP</td>
<td align="left">327 [4.99 %]</td>
<td align="left">CDKN2A</td>
<td align="left">186 [5.21 %]</td>
</tr>
<tr>
<td align="left">BCL2</td>
<td align="left">312 [4.76 %]</td>
<td align="left">INS</td>
<td align="left">146 [4.09 %]</td>
</tr>
<tr>
<td align="left">ABCB1</td>
<td align="left">209 [3.19 %]</td>
<td align="left">TKT</td>
<td align="left">137 [3.83 %]</td>
</tr>
<tr>
<td align="left">CASP3</td>
<td align="left">203 [3.1 %]</td>
<td align="left">SLC20A2</td>
<td align="left">136 [3.81 %]</td>
</tr>
<tr>
<td align="left" rowspan="10">Japan</td>
<td align="left" rowspan="10">5299</td>
<td align="left">ESR1</td>
<td align="left">918 [17.32 %]</td>
<td align="left" rowspan="10">The Netherlands</td>
<td align="left" rowspan="10">1844</td>
<td align="left">ESR1</td>
<td align="left">267 [14.48 %]</td>
</tr>
<tr>
<td align="left">ERBB2</td>
<td align="left">806 [15.21 %]</td>
<td align="left">BRCA1</td>
<td align="left">218 [11.82 %]</td>
</tr>
<tr>
<td align="left">PGR</td>
<td align="left">456 [8.61 %]</td>
<td align="left">ERBB2</td>
<td align="left">181 [9.82 %]</td>
</tr>
<tr>
<td align="left">EGF</td>
<td align="left">394 [7.44 %]</td>
<td align="left">BRCA2</td>
<td align="left">115 [6.24 %]</td>
</tr>
<tr>
<td align="left">CDKN2A</td>
<td align="left">340 [6.42 %]</td>
<td align="left">PGR</td>
<td align="left">115 [6.24 %]</td>
</tr>
<tr>
<td align="left">CYP19A1</td>
<td align="left">210 [3.96 %]</td>
<td align="left">EGF</td>
<td align="left">97 [5.26 %]</td>
</tr>
<tr>
<td align="left">SLC20A2</td>
<td align="left">159 [3 %]</td>
<td align="left">CDKN2A</td>
<td align="left">90 [4.88 %]</td>
</tr>
<tr>
<td align="left">CEACAM3</td>
<td align="left">151 [2.85 %]</td>
<td align="left">SLC20A2</td>
<td align="left">82 [4.45 %]</td>
</tr>
<tr>
<td align="left">BCL2L14</td>
<td align="left">129 [2.43 %]</td>
<td align="left">ABCB1</td>
<td align="left">81 [4.39 %]</td>
</tr>
<tr>
<td align="left">ABCB1</td>
<td align="left">129 [2.43 %]</td>
<td align="left">BCL2L14</td>
<td align="left">69 [3.74 %]</td>
</tr>
<tr>
<td align="left" rowspan="10">Italy</td>
<td align="left" rowspan="10">4621</td>
<td align="left">ERBB2</td>
<td align="left">808 [17.49 %]</td>
<td align="left" rowspan="10">Australia</td>
<td align="left" rowspan="10">1715</td>
<td align="left">ESR1</td>
<td align="left">260 [15.16 %]</td>
</tr>
<tr>
<td align="left">ESR1</td>
<td align="left">727 [15.73 %]</td>
<td align="left">ERBB2</td>
<td align="left">166 [9.68 %]</td>
</tr>
<tr>
<td align="left">PGR</td>
<td align="left">404 [8.74 %]</td>
<td align="left">PGR</td>
<td align="left">123 [7.17 %]</td>
</tr>
<tr>
<td align="left">EGF</td>
<td align="left">298 [6.45 %]</td>
<td align="left">BRCA1</td>
<td align="left">120 [7 %]</td>
</tr>
<tr>
<td align="left">CDKN2A</td>
<td align="left">297 [6.43 %]</td>
<td align="left">EGF</td>
<td align="left">94 [5.48 %]</td>
</tr>
<tr>
<td align="left">SLC20A2</td>
<td align="left">238 [5.15 %]</td>
<td align="left">SLC20A2</td>
<td align="left">85 [4.96 %]</td>
</tr>
<tr>
<td align="left">BRCA1</td>
<td align="left">197 [4.26 %]</td>
<td align="left">BRCA2</td>
<td align="left">73 [4.26 %]</td>
</tr>
<tr>
<td align="left">INS</td>
<td align="left">171 [3.7 %]</td>
<td align="left">INS</td>
<td align="left">72 [4.2 %]</td>
</tr>
<tr>
<td align="left">TKT</td>
<td align="left">159 [3.44 %]</td>
<td align="left">ARL11</td>
<td align="left">71 [4.14 %]</td>
</tr>
<tr>
<td align="left">CYP19A1</td>
<td align="left">156 [3.38 %]</td>
<td align="left">CDKN2A</td>
<td align="left">68 [3.97 %]</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
</app>
</app-group>
<fn-group>
<fn id="Fn1">
<label>1</label>
<p>
<ext-link ext-link-type="uri" xlink:href="http://www.knime.org">http://www.knime.org</ext-link>
(last visited 24 Nov 2014).</p>
</fn>
<fn id="Fn2">
<label>2</label>
<p>
<ext-link ext-link-type="uri" xlink:href="http://gephi.github.io">http://gephi.github.io</ext-link>
(last visited 24 Nov 2014).</p>
</fn>
<fn id="Fn3">
<label>3</label>
<p>
<ext-link ext-link-type="uri" xlink:href="https://developers.google.com/maps/">https://developers.google.com/maps/</ext-link>
(last visited 24 Nov 2014).</p>
</fn>
<fn id="Fn4">
<label>4</label>
<p>
<ext-link ext-link-type="uri" xlink:href="http://www.genemania.org">http://www.genemania.org</ext-link>
(last visited 24 Nov 2014).</p>
</fn>
<fn id="Fn5">
<label>5</label>
<p>
<ext-link ext-link-type="uri" xlink:href="http://thebiogrid.org">http://thebiogrid.org</ext-link>
(last visited 24 Nov 2014).</p>
</fn>
<fn id="Fn6">
<label>6</label>
<p>
<ext-link ext-link-type="uri" xlink:href="http://www.disgenet.org/web/DisGeNET/v2.1">http://www.disgenet.org/web/DisGeNET/v2.1</ext-link>
(last visited 24 Nov 2014).</p>
</fn>
<fn id="Fn7">
<label>7</label>
<p>
<ext-link ext-link-type="uri" xlink:href="http://django.nubic.northwestern.edu/fundo">http://django.nubic.northwestern.edu/fundo</ext-link>
(last visited 24 Nov 2014).</p>
</fn>
<fn id="Fn8">
<label>8</label>
<p>
<ext-link ext-link-type="uri" xlink:href="http://www.mathworks.com/matlabcentral/fileexchange/7486-clustering-toolbox">http://www.mathworks.com/matlabcentral/fileexchange/7486-clustering-toolbox</ext-link>
(last visited 24 Nov 2014).</p>
</fn>
<fn id="Fn9">
<label>9</label>
<p>
<ext-link ext-link-type="uri" xlink:href="http://ghr.nlm.nih.gov/gene/CDH1">http://ghr.nlm.nih.gov/gene/CDH1</ext-link>
(last visited 24 Nov 2014).</p>
</fn>
<fn id="Fn10">
<label>10</label>
<p>
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/gene/5602">http://www.ncbi.nlm.nih.gov/gene/5602</ext-link>
(last visited 24 Nov 2014).</p>
</fn>
</fn-group>
<ack>
<title>Authors’ contributions</title>
<p>GJ helped in developing the methodology, in running the tests and in analyzing the results. OA helped in developing the methodology, in crawling the data and in running the tests. AA helped in the design of the study, in drawing the figures and in the analysis of the results. SG participated in integrating the various processes to produce the integrated framework, and in the analysis. TO helped in crawling the data and in developing the methodology. DD helped in the analysis and validation of the results. RA participated in the development of the methodology and in the analysis of the results. GJ, OA, AA and RA drafted the manuscript. All authors read and approved the final manuscript.</p>
<sec id="FPar1">
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
</ack>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1.</label>
<mixed-citation publication-type="other">National Cancer Institute. Defining cancer. 2014.
<ext-link ext-link-type="uri" xlink:href="http://www.cancer.gov/cancertopics/cancerlibrary/what-is-cancer">http://www.cancer.gov/cancertopics/cancerlibrary/what-is-cancer</ext-link>
. Accessed 21 Sept 2014.</mixed-citation>
</ref>
<ref id="CR2">
<label>2.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>van’t Veer</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Bernards</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Enabling personalized cancer medicine through analysis of gene-expression patterns</article-title>
<source>Nature</source>
<year>2008</year>
<volume>452</volume>
<issue>7187</issue>
<fpage>564</fpage>
<lpage>570</lpage>
<pub-id pub-id-type="doi">10.1038/nature06915</pub-id>
<pub-id pub-id-type="pmid">18385730</pub-id>
</element-citation>
</ref>
<ref id="CR3">
<label>3.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mishra</surname>
<given-names>Alok</given-names>
</name>
<name>
<surname>Verma</surname>
<given-names>Mukesh</given-names>
</name>
</person-group>
<article-title>Cancer biomarkers: are we ready for the prime time?</article-title>
<source>Cancers</source>
<year>2010</year>
<volume>2</volume>
<issue>1</issue>
<fpage>190</fpage>
<lpage>208</lpage>
<pub-id pub-id-type="doi">10.3390/cancers2010190</pub-id>
<pub-id pub-id-type="pmid">24281040</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4.</label>
<mixed-citation publication-type="other">Genetics home reference. How do genes direct the production of proteins? 2014.
<ext-link ext-link-type="uri" xlink:href="http://ghr.nlm.nih.gov/handbook/howgeneswork/makingprotein">http://ghr.nlm.nih.gov/handbook/howgeneswork/makingprotein</ext-link>
. Accessed 22 Nov 2014.</mixed-citation>
</ref>
<ref id="CR5">
<label>5.</label>
<mixed-citation publication-type="other">National Cancer Institute. Genetic testing for hereditary cancer syndromes. 2013.
<ext-link ext-link-type="uri" xlink:href="http://www.cancer.gov/cancertopics/factsheet/Risk/genetic-testing">http://www.cancer.gov/cancertopics/factsheet/Risk/genetic-testing</ext-link>
. Accessed 21 Sept 2014.</mixed-citation>
</ref>
<ref id="CR6">
<label>6.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ozgür</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vu</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Erkan</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Radev</surname>
<given-names>DR</given-names>
</name>
</person-group>
<article-title>Identifying gene-disease associations using centrality on a literature mined gene-interaction network</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<issue>13</issue>
<fpage>i277</fpage>
<lpage>i285</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btn182</pub-id>
<pub-id pub-id-type="pmid">18586725</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<label>7.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Navathe</surname>
<given-names>SB</given-names>
</name>
<name>
<surname>Civera</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Dasigi</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Ram</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ciliax</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Dingledine</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Text mining biomedical literature for discovering gene-to-gene relationships: a comparative study of algorithms</article-title>
<source>IEEE/ACM Trans Comput Biol Bioinform</source>
<year>2005</year>
<volume>2</volume>
<issue>1</issue>
<fpage>62</fpage>
<lpage>76</lpage>
<pub-id pub-id-type="doi">10.1109/TCBB.2005.14</pub-id>
<pub-id pub-id-type="pmid">17044165</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<label>8.</label>
<mixed-citation publication-type="other">Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook. Vol. 2. Newyork: Springer, 2005.</mixed-citation>
</ref>
<ref id="CR9">
<label>9.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Otte</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Rousseau</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Social network analysis: a powerful strategy, also for the information sciences</article-title>
<source>J Inf Sci</source>
<year>2002</year>
<volume>28</volume>
<issue>6</issue>
<fpage>441</fpage>
<lpage>453</lpage>
<pub-id pub-id-type="doi">10.1177/016555150202800601</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akay</surname>
<given-names>MF</given-names>
</name>
</person-group>
<article-title>Support vector machines combined with feature selection for breast cancer diagnosis</article-title>
<source>Exp Syst Appl</source>
<year>2009</year>
<volume>36</volume>
<issue>2</issue>
<fpage>3240</fpage>
<lpage>3247</lpage>
<pub-id pub-id-type="doi">10.1016/j.eswa.2008.01.009</pub-id>
</element-citation>
</ref>
<ref id="CR11">
<label>11.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Faro</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Giordano</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Spampinato</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Combining literature text mining with microarray data: advances for system biology modeling</article-title>
<source>Brief Bioinform</source>
<year>2011</year>
<volume>13</volume>
<issue>1</issue>
<fpage>61</fpage>
<lpage>82</lpage>
<pub-id pub-id-type="doi">10.1093/bib/bbr018</pub-id>
<pub-id pub-id-type="pmid">21677032</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<label>12.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Patumcharoenpol</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Meechai</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vongsangnak</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Biomedical text mining and its applications in cancer research</article-title>
<source>J Biomed Inform</source>
<year>2013</year>
<volume>46</volume>
<issue>2</issue>
<fpage>200</fpage>
<lpage>211</lpage>
<pub-id pub-id-type="doi">10.1016/j.jbi.2012.10.007</pub-id>
<pub-id pub-id-type="pmid">23159498</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<label>13.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rebholz-Schuhmann</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Oellrich</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hoehndorf</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Text-mining solutions for biomedical research: enabling integrative biology</article-title>
<source>Nat Rev Genet</source>
<year>2012</year>
<volume>13</volume>
<issue>12</issue>
<fpage>829</fpage>
<lpage>839</lpage>
<pub-id pub-id-type="doi">10.1038/nrg3337</pub-id>
<pub-id pub-id-type="pmid">23150036</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<label>14.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jensen</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Saric</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Literature mining for the biologist: from information retrieval to biological discovery</article-title>
<source>Nat Rev Genet</source>
<year>2006</year>
<volume>7</volume>
<issue>2</issue>
<fpage>119</fpage>
<lpage>129</lpage>
<pub-id pub-id-type="doi">10.1038/nrg1768</pub-id>
<pub-id pub-id-type="pmid">16418747</pub-id>
</element-citation>
</ref>
<ref id="CR15">
<label>15.</label>
<mixed-citation publication-type="other">Groza T, Oellrich A, Collier N. Using silver and semi-gold standard corpora to compare open named entity recognisers. In: 2013 IEEE international conference on bioinformatics and biomedicine, IEEE BIBM 2013. 2013. p. 481–485.</mixed-citation>
</ref>
<ref id="CR16">
<label>16.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nunes</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Campos</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Matos</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Oliveira</surname>
<given-names>JL</given-names>
</name>
</person-group>
<article-title>BeCAS: biomedical concept recognition services and visualization</article-title>
<source>Bioinformatics</source>
<year>2013</year>
<volume>29</volume>
<issue>15</issue>
<fpage>1915</fpage>
<lpage>1916</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btt317</pub-id>
<pub-id pub-id-type="pmid">23736528</pub-id>
</element-citation>
</ref>
<ref id="CR17">
<label>17.</label>
<mixed-citation publication-type="other">Faro A, Giordano D, Spampinato C. Discovery and assessment of gene-disease associations by integrated analysis of scientific literature and microarray data. In: Information technology and applications in biomedicine (ITAB), 2010 10th IEEE international conference on 2010. 2010. p. 1–5.</mixed-citation>
</ref>
<ref id="CR18">
<label>18.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stears</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Martinsky</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Schena</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Trends in microarray analysis</article-title>
<source>Nat Med</source>
<year>2003</year>
<volume>9</volume>
<issue>1</issue>
<fpage>140</fpage>
<lpage>145</lpage>
<pub-id pub-id-type="doi">10.1038/nm0103-140</pub-id>
<pub-id pub-id-type="pmid">12514728</pub-id>
</element-citation>
</ref>
<ref id="CR19">
<label>19.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Warde-Farley</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Donaldson</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Comes</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Zuberi</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Badrawi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Chao</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Franz</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Grouios</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kazi</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Lopes</surname>
<given-names>CT</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The genemania prediction server: biological network integration for gene prioritization and predicting gene function</article-title>
<source>Nucleic Acids Res</source>
<year>2010</year>
<volume>38</volume>
<issue>suppl 2</issue>
<fpage>W214</fpage>
<lpage>W220</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkq537</pub-id>
<pub-id pub-id-type="pmid">20576703</pub-id>
</element-citation>
</ref>
<ref id="CR20">
<label>20.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bauer-Mehren</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bundschus</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rautschka</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Mayer</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Sanz</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Furlong</surname>
<given-names>LI</given-names>
</name>
</person-group>
<article-title>Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases</article-title>
<source>PLoS ONE</source>
<year>2011</year>
<volume>6</volume>
<issue>6</issue>
<fpage>e20284</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0020284</pub-id>
<pub-id pub-id-type="pmid">21695124</pub-id>
</element-citation>
</ref>
<ref id="CR21">
<label>21.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bauer-Mehren</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Rautschka</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sanz</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Furlong</surname>
<given-names>LI</given-names>
</name>
</person-group>
<article-title>DisGeNET: a cytoscape plugin to visualize, integrate, search and analyze gene–disease networks</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<issue>22</issue>
<fpage>2924</fpage>
<lpage>2926</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq538</pub-id>
<pub-id pub-id-type="pmid">20861032</pub-id>
</element-citation>
</ref>
<ref id="CR22">
<label>22.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Osborne</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Flatow</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Holko</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Kibbe</surname>
<given-names>WA</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Danila</surname>
<given-names>MI</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Chisholm</surname>
<given-names>RL</given-names>
</name>
</person-group>
<article-title>Annotating the human genome with disease ontology</article-title>
<source>BMC Genom</source>
<year>2009</year>
<volume>10</volume>
<issue>Suppl 1</issue>
<fpage>S6</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-10-S1-S6</pub-id>
</element-citation>
</ref>
<ref id="CR23">
<label>23.</label>
<mixed-citation publication-type="other">Spampinato C, Giordano D, Kavasidis I, Milardo S. Biowizard: Discovering and validating associations between biological entities by integrated analysis of scientific literature and experimental data. In: Computer-based medical systems (CBMS), 2012 25th International Symposium on; 2012. p. 1–6.</mixed-citation>
</ref>
<ref id="CR24">
<label>24.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spampinato</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kavasidis</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Aldinucci</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Pino</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Giordano</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Faro</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Discovering biological knowledge by integrating high-throughput data and scientific literature on the cloud</article-title>
<source>Pract Exp</source>
<year>2013</year>
<volume>26</volume>
<issue>10</issue>
<fpage>1771</fpage>
<lpage>1786</lpage>
<pub-id pub-id-type="doi">10.1002/cpe.3130</pub-id>
</element-citation>
</ref>
<ref id="CR25">
<label>25.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ozgür</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Xiang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Radev</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Literature-based discovery of ifn- and vaccine-mediated gene interaction networks</article-title>
<source>J Biomed Biotechnol</source>
<year>2010</year>
<volume>2010</volume>
<fpage>426479</fpage>
<pub-id pub-id-type="doi">10.1155/2010/426479</pub-id>
<pub-id pub-id-type="pmid">20625487</pub-id>
</element-citation>
</ref>
<ref id="CR26">
<label>26.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<collab>UniProt Consortium et al</collab>
</person-group>
<article-title>Activities at the universal protein resource (uniprot)</article-title>
<source>Nucleic Acids Res</source>
<year>2014</year>
<volume>42</volume>
<issue>D1</issue>
<fpage>D191</fpage>
<lpage>D198</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkt1140</pub-id>
<pub-id pub-id-type="pmid">24253303</pub-id>
</element-citation>
</ref>
<ref id="CR27">
<label>27.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hudis</surname>
<given-names>CA</given-names>
</name>
</person-group>
<article-title>Trastuzumabmechanism of action and use in clinical practice</article-title>
<source>New Engl J Med</source>
<year>2007</year>
<volume>357</volume>
<issue>1</issue>
<fpage>39</fpage>
<lpage>51</lpage>
<pub-id pub-id-type="doi">10.1056/NEJMra043186</pub-id>
<pub-id pub-id-type="pmid">17611206</pub-id>
</element-citation>
</ref>
<ref id="CR28">
<label>28.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vogel</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Cobleigh</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Tripathy</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Gutheil</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>LN</given-names>
</name>
<name>
<surname>Fehrenbacher</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Slamon</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Novotny</surname>
<given-names>WF</given-names>
</name>
<name>
<surname>Burchmore</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Efficacy and safety of trastuzumab as a single agent in first-line treatment of her2-overexpressing metastatic breast cancer</article-title>
<source>J Clin Oncol</source>
<year>2002</year>
<volume>20</volume>
<issue>3</issue>
<fpage>719</fpage>
<lpage>726</lpage>
<pub-id pub-id-type="doi">10.1200/JCO.20.3.719</pub-id>
<pub-id pub-id-type="pmid">11821453</pub-id>
</element-citation>
</ref>
<ref id="CR29">
<label>29.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lumachi</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Brunello</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Maruzzo</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Basso</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Basso</surname>
<given-names>SMM</given-names>
</name>
</person-group>
<article-title>Treatment of estrogen receptor-positive breast cancer</article-title>
<source>Curr Med Chem</source>
<year>2013</year>
<volume>20</volume>
<issue>5</issue>
<fpage>596</fpage>
<lpage>604</lpage>
<pub-id pub-id-type="doi">10.2174/092986713804999303</pub-id>
<pub-id pub-id-type="pmid">23278394</pub-id>
</element-citation>
</ref>
<ref id="CR30">
<label>30.</label>
<mixed-citation publication-type="other">National Cancer Institute. Hormone therapy for breast cancer. 2014.
<ext-link ext-link-type="uri" xlink:href="http://www.cancer.org/cancer/breastcancer/detailedguide/breast-cancer-treating-hormone-therapy">http://www.cancer.org/cancer/breastcancer/detailedguide/breast-cancer-treating-hormone-therapy</ext-link>
. Accessed 21 Dec 2014.</mixed-citation>
</ref>
<ref id="CR31">
<label>31.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frasor</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>EC</given-names>
</name>
<name>
<surname>Komm</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>CY</given-names>
</name>
<name>
<surname>Vega</surname>
<given-names>VB</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>ET</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>LD</given-names>
</name>
<name>
<surname>Smeds</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bergh</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Katzenellenbogen</surname>
<given-names>BS</given-names>
</name>
</person-group>
<article-title>Gene expression preferentially regulated by tamoxifen in breast cancer cells and correlations with clinical outcome</article-title>
<source>Cancer Res</source>
<year>2006</year>
<volume>66</volume>
<issue>14</issue>
<fpage>7334</fpage>
<lpage>7340</lpage>
<pub-id pub-id-type="doi">10.1158/0008-5472.CAN-05-4269</pub-id>
<pub-id pub-id-type="pmid">16849584</pub-id>
</element-citation>
</ref>
<ref id="CR32">
<label>32.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Olsson</surname>
<given-names>PA</given-names>
</name>
<name>
<surname>Korhonen</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Mercer</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Lindholm</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Mir is a novel erm-like protein that interacts with myosin regulatory light chain and inhibits neurite outgrowth</article-title>
<source>J Biol Chem</source>
<year>1999</year>
<volume>274</volume>
<issue>51</issue>
<fpage>36288</fpage>
<lpage>36292</lpage>
<pub-id pub-id-type="doi">10.1074/jbc.274.51.36288</pub-id>
<pub-id pub-id-type="pmid">10593918</pub-id>
</element-citation>
</ref>
<ref id="CR33">
<label>33.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Prudnikova</surname>
<given-names>TY</given-names>
</name>
<name>
<surname>Mostovich</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Domanitskaya</surname>
<given-names>NV</given-names>
</name>
<name>
<surname>Pavlova</surname>
<given-names>TV</given-names>
</name>
<name>
<surname>Kashuba</surname>
<given-names>VI</given-names>
</name>
<name>
<surname>Zabarovsky</surname>
<given-names>ER</given-names>
</name>
<name>
<surname>Grigorieva</surname>
<given-names>EV</given-names>
</name>
</person-group>
<article-title>Antiproliferative effect of d-glucuronyl c5-epimerase in human breast cancer cells</article-title>
<source>Cancer Cell Int</source>
<year>2010</year>
<volume>10</volume>
<fpage>27</fpage>
<pub-id pub-id-type="doi">10.1186/1475-2867-10-27</pub-id>
<pub-id pub-id-type="pmid">20723247</pub-id>
</element-citation>
</ref>
<ref id="CR34">
<label>34.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Magnon</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hall</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Xue</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Gerber</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Freedland</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Frenette</surname>
<given-names>PS</given-names>
</name>
</person-group>
<article-title>Autonomic nerve development contributes to prostate cancer progression</article-title>
<source>Science</source>
<year>2013</year>
<volume>341</volume>
<issue>6142</issue>
<fpage>1236361</fpage>
<pub-id pub-id-type="doi">10.1126/science.1236361</pub-id>
<pub-id pub-id-type="pmid">23846904</pub-id>
</element-citation>
</ref>
<ref id="CR35">
<label>35.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>MS</given-names>
</name>
</person-group>
<article-title>A cluster validity index for fuzzy clustering</article-title>
<source>Pattern Recognit Lett</source>
<year>2005</year>
<volume>26</volume>
<issue>9</issue>
<fpage>1275</fpage>
<lpage>1291</lpage>
<pub-id pub-id-type="doi">10.1016/j.patrec.2004.11.022</pub-id>
</element-citation>
</ref>
<ref id="CR36">
<label>36.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goh</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Cusick</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Valle</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Childs</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Vidal</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Barabási</surname>
<given-names>AL</given-names>
</name>
</person-group>
<article-title>The human disease network</article-title>
<source>Proc Nat Acad Sci</source>
<year>2007</year>
<volume>104</volume>
<issue>21</issue>
<fpage>8685</fpage>
<lpage>8690</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0701361104</pub-id>
<pub-id pub-id-type="pmid">17502601</pub-id>
</element-citation>
</ref>
<ref id="CR37">
<label>37.</label>
<mixed-citation publication-type="other">Castro MA, Santiagode I, Campbell TM, Vaughn C, Hickey TE, Ross E, Tilley WD, Markowetz F, Ponder BA, Meyer KB. Regulators of genetic risk of breast cancer identified by integrative network analysis. Nat Genet. 2015.</mixed-citation>
</ref>
<ref id="CR38">
<label>38.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dai</surname>
<given-names>HJ</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>YC</given-names>
</name>
<name>
<surname>Tsai</surname>
<given-names>RT</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>WL</given-names>
</name>
</person-group>
<article-title>New challenges for biological text-mining in the next decade</article-title>
<source>J Comput Sci Technol</source>
<year>2010</year>
<volume>25</volume>
<issue>1</issue>
<fpage>169</fpage>
<lpage>179</lpage>
<pub-id pub-id-type="doi">10.1007/s11390-010-9313-5</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Canada/explor/ParkinsonCanadaV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000243 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000243 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Canada
   |area=    ParkinsonCanadaV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4845430
   |texte=   Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:27112211" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a ParkinsonCanadaV1 

Wicri

This area was generated with Dilib version V0.6.29.
Data generation: Thu May 4 22:20:19 2017. Site generation: Fri Dec 23 23:17:26 2022