Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000049 ( Pmc/Corpus ); précédent : 0000489; suivant : 0000500 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Gene–disease relationship discovery based on model-driven data integration and database view definition</title>
<author>
<name sortKey="Yilmaz, S" sort="Yilmaz, S" uniqKey="Yilmaz S" first="S." last="Yilmaz">S. Yilmaz</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="AFF1">Laboratory for Human Genetics, Nancy Medical Faculty, rue du Morvan, 54500 Vandoeuvre-les-Nancy cedex</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Jonveaux, P" sort="Jonveaux, P" uniqKey="Jonveaux P" first="P." last="Jonveaux">P. Jonveaux</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="AFF1">Laboratory for Human Genetics, Nancy Medical Faculty, rue du Morvan, 54500 Vandoeuvre-les-Nancy cedex</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bicep, C" sort="Bicep, C" uniqKey="Bicep C" first="C." last="Bicep">C. Bicep</name>
<affiliation>
<nlm:aff id="AFF1">LORIA UMR7503, CNRS, INRIA, Nancy-Université, BP239, 54506 Vandoeuvre-les-Nancy cedex, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pierron, L" sort="Pierron, L" uniqKey="Pierron L" first="L." last="Pierron">L. Pierron</name>
<affiliation>
<nlm:aff id="AFF1">LORIA UMR7503, CNRS, INRIA, Nancy-Université, BP239, 54506 Vandoeuvre-les-Nancy cedex, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Smail Tabbone, M" sort="Smail Tabbone, M" uniqKey="Smail Tabbone M" first="M." last="Smaïl-Tabbone">M. Smaïl-Tabbone</name>
<affiliation>
<nlm:aff id="AFF1">LORIA UMR7503, CNRS, INRIA, Nancy-Université, BP239, 54506 Vandoeuvre-les-Nancy cedex, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Devignes, M D" sort="Devignes, M D" uniqKey="Devignes M" first="M. D." last="Devignes">M. D. Devignes</name>
<affiliation>
<nlm:aff id="AFF1">LORIA UMR7503, CNRS, INRIA, Nancy-Université, BP239, 54506 Vandoeuvre-les-Nancy cedex, France</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">19042916</idno>
<idno type="pmc">2639000</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2639000</idno>
<idno type="RBID">PMC:2639000</idno>
<idno type="doi">10.1093/bioinformatics/btn612</idno>
<date when="2008">2008</date>
<idno type="wicri:Area/Pmc/Corpus">000049</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000049</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Gene–disease relationship discovery based on model-driven data integration and database view definition</title>
<author>
<name sortKey="Yilmaz, S" sort="Yilmaz, S" uniqKey="Yilmaz S" first="S." last="Yilmaz">S. Yilmaz</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="AFF1">Laboratory for Human Genetics, Nancy Medical Faculty, rue du Morvan, 54500 Vandoeuvre-les-Nancy cedex</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Jonveaux, P" sort="Jonveaux, P" uniqKey="Jonveaux P" first="P." last="Jonveaux">P. Jonveaux</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="AFF1">Laboratory for Human Genetics, Nancy Medical Faculty, rue du Morvan, 54500 Vandoeuvre-les-Nancy cedex</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bicep, C" sort="Bicep, C" uniqKey="Bicep C" first="C." last="Bicep">C. Bicep</name>
<affiliation>
<nlm:aff id="AFF1">LORIA UMR7503, CNRS, INRIA, Nancy-Université, BP239, 54506 Vandoeuvre-les-Nancy cedex, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pierron, L" sort="Pierron, L" uniqKey="Pierron L" first="L." last="Pierron">L. Pierron</name>
<affiliation>
<nlm:aff id="AFF1">LORIA UMR7503, CNRS, INRIA, Nancy-Université, BP239, 54506 Vandoeuvre-les-Nancy cedex, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Smail Tabbone, M" sort="Smail Tabbone, M" uniqKey="Smail Tabbone M" first="M." last="Smaïl-Tabbone">M. Smaïl-Tabbone</name>
<affiliation>
<nlm:aff id="AFF1">LORIA UMR7503, CNRS, INRIA, Nancy-Université, BP239, 54506 Vandoeuvre-les-Nancy cedex, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Devignes, M D" sort="Devignes, M D" uniqKey="Devignes M" first="M. D." last="Devignes">M. D. Devignes</name>
<affiliation>
<nlm:aff id="AFF1">LORIA UMR7503, CNRS, INRIA, Nancy-Université, BP239, 54506 Vandoeuvre-les-Nancy cedex, France</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1460-2059</idno>
<imprint>
<date when="2008">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>
<bold>Motivation:</bold>
Computational methods are widely used to discover gene–disease relationships hidden in vast masses of available genomic and post-genomic data. In most current methods, a similarity measure is calculated between gene annotations and known disease genes or disease descriptions. However, more explicit gene–disease relationships are required for better insights into the molecular bases of diseases, especially for complex multi-gene diseases.</p>
<p>
<bold>Results:</bold>
Explicit relationships between genes and diseases are formulated as candidate gene definitions that may include intermediary genes, e.g. orthologous or interacting genes. These definitions guide data modelling in our database approach for gene–disease relationship discovery and are expressed as views which ultimately lead to the retrieval of documented sets of candidate genes. A system called ACGR (Approach for Candidate Gene Retrieval) has been implemented and tested with three case studies including a rare orphan gene disease.</p>
<p>
<bold>Availability:</bold>
The ACGR sources are freely available at
<ext-link ext-link-type="uri" xlink:href="http://bioinfo.loria.fr/projects/acgr/acgr-software/">http://bioinfo.loria.fr/projects/acgr/acgr-software/</ext-link>
. See especially the file ‘disease_description’ and the folders ‘Xcollect_scenarios’ and ‘ACGR_views’.</p>
<p>
<bold>Contact:</bold>
<email>devignes@loria.fr</email>
</p>
<p>
<bold>Supplementary information:</bold>
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btn612/DC1">Supplementary data</ext-link>
are available at
<italic>Bioinformatics</italic>
online.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="EN">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Bioinformatics</journal-id>
<journal-id journal-id-type="publisher-id">bioinformatics</journal-id>
<journal-id journal-id-type="hwp">bioinfo</journal-id>
<journal-title>Bioinformatics</journal-title>
<issn pub-type="ppub">1367-4803</issn>
<issn pub-type="epub">1460-2059</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">19042916</article-id>
<article-id pub-id-type="pmc">2639000</article-id>
<article-id pub-id-type="doi">10.1093/bioinformatics/btn612</article-id>
<article-id pub-id-type="publisher-id">btn612</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Original Papers</subject>
<subj-group>
<subject>Genetics and Population Analysis</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Gene–disease relationship discovery based on model-driven data integration and database view definition</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Yilmaz</surname>
<given-names>S.</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jonveaux</surname>
<given-names>P.</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bicep</surname>
<given-names>C.</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pierron</surname>
<given-names>L.</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Smaïl-Tabbone</surname>
<given-names>M.</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Devignes</surname>
<given-names>M.D.</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="COR1">*</xref>
</contrib>
</contrib-group>
<aff id="AFF1">
<sup>1</sup>
Laboratory for Human Genetics, Nancy Medical Faculty, rue du Morvan, 54500 Vandoeuvre-les-Nancy cedex and
<sup>2</sup>
LORIA UMR7503, CNRS, INRIA, Nancy-Université, BP239, 54506 Vandoeuvre-les-Nancy cedex, France</aff>
<author-notes>
<fn>
<p>Associate Editor: Alex Bateman</p>
</fn>
<corresp id="COR1">*To whom correspondence should be addressed.</corresp>
</author-notes>
<pub-date pub-type="ppub">
<day>15</day>
<month>1</month>
<year>2009</year>
</pub-date>
<pub-date pub-type="epub">
<day>18</day>
<month>11</month>
<year>2008</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>18</day>
<month>11</month>
<year>2008</year>
</pub-date>
<volume>25</volume>
<issue>2</issue>
<fpage>230</fpage>
<lpage>236</lpage>
<history>
<date date-type="received">
<day>1</day>
<month>7</month>
<year>2008</year>
</date>
<date date-type="rev-recd">
<day>20</day>
<month>11</month>
<year>2008</year>
</date>
<date date-type="accepted">
<day>21</day>
<month>11</month>
<year>2008</year>
</date>
</history>
<permissions>
<copyright-statement>© 2008 The Author(s)</copyright-statement>
<copyright-year>2008</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by-nc/2.0/uk/">
<p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/2.0/uk/">http://creativecommons.org/licenses/by-nc/2.0/uk/</ext-link>
) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
</license>
</permissions>
<abstract>
<p>
<bold>Motivation:</bold>
Computational methods are widely used to discover gene–disease relationships hidden in vast masses of available genomic and post-genomic data. In most current methods, a similarity measure is calculated between gene annotations and known disease genes or disease descriptions. However, more explicit gene–disease relationships are required for better insights into the molecular bases of diseases, especially for complex multi-gene diseases.</p>
<p>
<bold>Results:</bold>
Explicit relationships between genes and diseases are formulated as candidate gene definitions that may include intermediary genes, e.g. orthologous or interacting genes. These definitions guide data modelling in our database approach for gene–disease relationship discovery and are expressed as views which ultimately lead to the retrieval of documented sets of candidate genes. A system called ACGR (Approach for Candidate Gene Retrieval) has been implemented and tested with three case studies including a rare orphan gene disease.</p>
<p>
<bold>Availability:</bold>
The ACGR sources are freely available at
<ext-link ext-link-type="uri" xlink:href="http://bioinfo.loria.fr/projects/acgr/acgr-software/">http://bioinfo.loria.fr/projects/acgr/acgr-software/</ext-link>
. See especially the file ‘disease_description’ and the folders ‘Xcollect_scenarios’ and ‘ACGR_views’.</p>
<p>
<bold>Contact:</bold>
<email>devignes@loria.fr</email>
</p>
<p>
<bold>Supplementary information:</bold>
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btn612/DC1">Supplementary data</ext-link>
are available at
<italic>Bioinformatics</italic>
online.</p>
</abstract>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="SEC1">
<title>1 INTRODUCTION</title>
<p>Understanding the molecular basis of a disease ultimately means correlating disease symptoms with altered gene function(s) thus highlighting gene–disease relationships. Identifying the genes responsible for human diseases is a first step towards this goal. More than 6100 disease phenotypes are described in the OMIM (Online Mendelian Inheritance in Man) database (DB). Among these phenotypes, more than 2400 have at least one known molecular basis (entries prefixed with #). Thus, about 3700 disease phenotypes described in the OMIM DB are not yet associated with any responsible gene. These disease phenotypes are particularly challenging since they include rare syndromes for which limited experimental data are available and complex multi-genic disorders involving various causative and susceptibility genes (Botstein and Risch,
<xref ref-type="bibr" rid="B6">2003</xref>
).</p>
<p>Integrative genomics approaches are becoming indispensable tools for discovering new gene–disease relationships. These approaches rely on efficient exploitation of functional genomics data sources (Giallourakis et al.,
<xref ref-type="bibr" rid="B12">2005</xref>
) and take advantage of numerous computer-based systems that have been developed in the last 5 years. These systems can be classified into three main groups. First, generalist systems predict disease genes based on their properties or interactions (Adie et al.,
<xref ref-type="bibr" rid="B1">2005</xref>
; Calvo et al.,
<xref ref-type="bibr" rid="B7">2007</xref>
; Lopez-Bigas and Ouzounis,
<xref ref-type="bibr" rid="B17">2004</xref>
; Lopez-Bigas et al.,
<xref ref-type="bibr" rid="B18">2006</xref>
; Oti et al.,
<xref ref-type="bibr" rid="B24">2006</xref>
; Tu et al.,
<xref ref-type="bibr" rid="B32">2006</xref>
; Xu and Li,
<xref ref-type="bibr" rid="B40">2006</xref>
). Consistent features are thus detected among approximately 1600 disease genes listed in the OMIM morbid map and used for these studies. Indeed, disease genes tend to be longer, are composed of more exons, show a higher degree of interspecies conservation, and are involved in more interactions than other genes. However, these approaches are unable to establish the correspondence between a given disease and a set of genes.</p>
<p>The second group of systems apply strategies relying on the hypothesis that similar diseases are most likely caused by similar genes. These strategies are often called prioritization methods since they aim to rank a given list of genes with respect to their probability to cause a disease (Adie et al.,
<xref ref-type="bibr" rid="B2">2006</xref>
; Aerts et al.,
<xref ref-type="bibr" rid="B3">2006</xref>
; Freudenberg and Propping,
<xref ref-type="bibr" rid="B10">2002</xref>
; George et al.,
<xref ref-type="bibr" rid="B11">2006</xref>
; Perez-Iratxeta et al.,
<xref ref-type="bibr" rid="B25">2002</xref>
,
<xref ref-type="bibr" rid="B26">2005</xref>
; Rossi et al.,
<xref ref-type="bibr" rid="B27">2006</xref>
; Turner et al.,
<xref ref-type="bibr" rid="B33">2003</xref>
). Additionally, alternative strategies based on the same similarity hypothesis aim to characterize user-defined groups of genes (Barillot et al.,
<xref ref-type="bibr" rid="B5">2004</xref>
; Chiang et al.,
<xref ref-type="bibr" rid="B8">2006</xref>
; Masseroli et al.,
<xref ref-type="bibr" rid="B21">2004</xref>
,
<xref ref-type="bibr" rid="B22">2005</xref>
; Sun et al.,
<xref ref-type="bibr" rid="B29">2006</xref>
). In order to find additional responsible genes, prioritization methods are often applied to a single disease whose associated chromosomal loci are known. A pool of statistical methods is then used to compute similarity measures dealing with various gene features. Such gene features are particularly well covered in the endeavour system (Aerts et al.,
<xref ref-type="bibr" rid="B3">2006</xref>
), e.g. sequence similarity, domain composition, tissue expression, Gene Ontology (GO) annotation, interspecies conservation, protein–protein interactions, involved pathways and
<italic>cis</italic>
-regulatory elements. However, this type of prioritization strategy requires at least one well-known gene to be used as a reference candidate gene.</p>
<p>Finally, a third group of methods gathers integrated systems that help users to formulate complex multi-criteria queries to retrieve appropriate collections of relevant genes. For instance, the GeneSeeker system (van Driel et al.,
<xref ref-type="bibr" rid="B34">2005</xref>
) and the GeneSorter functionality proposed by UCSC Genome Browser (Kent et al.,
<xref ref-type="bibr" rid="B14">2005</xref>
) allow experts to test various hypotheses on criteria that can link genes to diseases. An example is found in Tiffin et al. (
<xref ref-type="bibr" rid="B31">2005</xref>
), who developed a strategy to identify genes expressed in tissue affected by a disease. Hence, candidate genes are selected if their corresponding annotations with respect to a controlled vocabulary (i.e. eVOC, which is used in Ensembl EST annotation) match the disease annotation. Relevant eVOC annotations for the studied diseases were derived from PubMed abstracts using text-mining techniques.</p>
<p>The Approach for Candidate Gene Retrieval (ACGR) presented in this article is inspired from this last group of methods. Indeed, we propose four steps to guide the discovery of gene–disease relationships. First, several precise definitions of candidate genes are formulated. Next, these definitions are used to design a relational data model and to populate a dedicated DB with relevant data extracted from various internet resources. Finally, to retrieve sets of candidate genes, DB views that express candidate gene definitions are created. Available experimental data can be included in the disease gene definitions and thus exploited together with public annotation data. The approach presented here is tested with three case studies, including a rare orphan gene syndrome.</p>
</sec>
<sec sec-type="methods" id="SEC2">
<title>2 SYSTEMS AND METHODS</title>
<sec id="SEC2.1">
<title>2.1 Explicit gene–disease relationships</title>
<p>The definition of a candidate gene provided by the Webster Medical Dictionary is ‘any gene thought likely to cause a disease’. This definition implies that a candidate gene is a gene which is somehow related to a disease. However, specific gene–disease relationships that exist between candidate genes and studied diseases can be articulated in more useful ways by considering information that is available in various public DBs as well as wet-lab datasets.</p>
<p>The most obvious relationship between candidate genes and disease, hereafter called ‘is_co-localized_with’ (denoted by
<sc>l</sc>
), expresses the inferred relationship between the localization of a candidate gene and a chromosomal region linked to a given disease. This principle embodied within this statement has guided positional cloning for a long time. The precision of disease localization on chromosomes is highly variable depending on available data. Thanks to recent techniques such as array-CGH (Shaw- Smith et al.,
<xref ref-type="bibr" rid="B28">2004</xref>
; Vermeesch et al.,
<xref ref-type="bibr" rid="B35">2007</xref>
; Vissers et al.,
<xref ref-type="bibr" rid="B37">2005</xref>
), available localization data can be refined using experimental data.</p>
<p>Another direct relationship is tissue or developmental co-expression of both genes and disease features. This relationship has been used in various prioritization methods (Tiffin et al.,
<xref ref-type="bibr" rid="B31">2005</xref>
). A variant of this relationship called ‘is_dysregulated_in’ (denoted by
<sc>d</sc>
) considers the dysregulation (over-expression or repression) of candidate genes in transcriptomic studies involving patient samples.</p>
<p>Functional annotation of genes is improving in most available DBs and can be connected to disease descriptions. Hence a relationship called ‘has_similar_functional_annotation_with’ (denoted by
<sc>f</sc>
) is defined on the basis of a similarity measure between functional annotations of a gene and a disease.</p>
<p>One key aspect of our approach is that the relationship between a candidate gene and a disease may also involve an intermediate gene which satisfies some relationship with the disease. Here, we explore two types of intermediate genes, namely orthologous and interacting genes. It is noteworthy that the co-localization relationship
<sc>l</sc>
only applies to the candidate gene itself; whereas, both dysregulation
<sc>d</sc>
and functional similarity
<sc>f</sc>
relationships apply to intermediate genes as well. Complex definitions are then constructed in the form: ‘a candidate gene is a gene that is co-localized with the disease and is orthologous to a gene that has similar functional annotation with the disease’ and ‘a candidate gene is a gene that is co-localized with the disease and that interacts with a gene that is dysregulated in patients affected by the disease’. The former definition assumes the existence of two relationships, namely
<sc>l</sc>
and
<sc>f</sc>
, which connect the disease with the candidate gene and with one of its orthologs in a model organism, respectively. The latter definition assumes the existence of two relationships, namely
<sc>l</sc>
and
<sc>d</sc>
, which connect the disease with the candidate gene and with one of its interaction partners, respectively. Further complex definitions can be formulated similarly, such as ‘a candidate gene is a gene that is co-localized with the disease and that interacts with a gene which is in turn orthologous to a gene having similar functional annotation with the disease’. Retrieving sets of candidate genes which match such complex definitions from masses of biological data are the challenge taken up by the ACGR approach described in this article.</p>
</sec>
<sec id="SEC2.2">
<title>2.2 Relevance of functional gene–disease relationships</title>
<p>In order to assess the relevance of discovered gene–disease relationships, we introduce a measure quantifying the functional similarity relationship
<sc>f</sc>
between a gene and a disease. However, to date, no common vocabulary is available to describe functional features of both diseases and genes, hence impeding any straight-forward comparison of disease and gene functional annotations. Current prioritization methods quantify the functional similarity between test genes and training genes based on their GO annotations (Khatri and Draghici,
<xref ref-type="bibr" rid="B15">2005</xref>
). Ideally the disease functional features should be described with GO vocabulary so that the similarity between gene and disease can be obtained by calculating the similarity between their GO annotations. In practice such disease annotation is performed by an expert of the disease.</p>
<p>This procedure for assessing the relevance of gene–disease relationship presents three main advantages. First, an initial set of training genes is no longer required. Second, available knowledge about the disease is included in disease description. Finally, the rich GO annotations that are available for genes from model organisms will be propagated to human genes thanks to candidate gene definitions involving intermediate orthologous genes.</p>
</sec>
<sec id="SEC2.3">
<title>2.3 Overall presentation of the ACGR approach</title>
<p>The following five steps conceptually describe the proposed
<italic>in silico</italic>
methodology for candidate gene retrieval. (i) Our system takes as input a functional description of a disease, established by an expert using the GO vocabulary (see
<xref ref-type="sec" rid="SEC3.2">Section 3.2</xref>
), as well as available experimental datasets. The system then collects data from various public DBs. (ii) It first retrieves genes sharing GO annotations with the input disease from either human or model organisms. (iii) Next, relevant annotations of these genes are added, including cytogenetic localization, functional annotation, interacting genes and human orthologs of genes from model organisms. (iv) All retrieved genes are then assigned similarity values that are calculated on the basis of their annotation similarity with the input disease. (v) Finally, sets of candidate genes along with relevant annotation data are built that correspond to various candidate gene definitions.</p>
<p>Our system's architecture is centred on a DB which is controlled by a DataBase Management System (DBMS). There are three main features of a DBMS that make it attractive to use: centralized data management, data independence and data integration. This contrasts with conventional data processing systems in which each application program has direct access to the data it manipulates. In a DBMS, all data are integrated thereby reducing redundancies and inconsistencies and making data management more efficient. Finally, the existence of a domain data model ensures global data coherence.</p>
<p>The most commonly used conceptual framework for a DBMS is the three-level architecture suggested by the ANSI/SPARC committee (ANSI/X3/SPARC,
<xref ref-type="bibr" rid="B4">1975</xref>
). The three levels are considered as three different views on the data: (i) the external level or individual user view; (ii) the conceptual level or community user view; and (iii) the internal level or storage view. This three-level DB architecture allows a clear separation of the information meaning (conceptual view) from the physical data structure layer. A DB system that can separate these modelling levels is likely to be flexible and adaptable. The external level is a restricted view on the data, and the same DB may provide a number of different views for different categories of users or needs. In our approach, the candidate gene definitions proposed in
<xref ref-type="sec" rid="SEC2.1">Section 2.1</xref>
constitute external views on data collected about genes and diseases. The conceptual level determines the data model of the domain of interest, and includes all the information that will be represented in the DB. Finally, the physical model will be replaced here with the so-called ‘logical model’ (Teorey et al.,
<xref ref-type="bibr" rid="B30">2006</xref>
) because the latter is independent of any particular commercial DBMS.</p>
</sec>
</sec>
<sec id="SEC3">
<title>3 ALGORITHM</title>
<sec id="SEC3.1">
<title>3.1 DB design</title>
<p>The detailed definitions and relationships presented in
<xref ref-type="sec" rid="SEC2.1">Section 2.1</xref>
lead to a specification of the various types of data relevant for the retrieval of candidate genes. The resulting conceptual data model is presented in
<xref ref-type="fig" rid="F1">Figure 1</xref>
in a common entity–relationship (ER) format.
<fig id="F1" position="float">
<label>Fig. 1.</label>
<caption>
<p>Conceptual data model for the ACGR DB. Entity types are represented as boxes and relationship types as ellipses. Participation of an entity in a relationship is quantified as the minimal and maximal number of times each occurrence of the entity can participate in the relationship. Note that Cytoband in the Gene entity is an abbreviation for ‘cytogenetic band’.</p>
</caption>
<graphic xlink:href="btn612f1"></graphic>
</fig>
</p>
<p>Queries corresponding to any candidate gene definition (
<xref ref-type="sec" rid="SEC2.1">Section 2.1</xref>
) can be addressed to a DB constructed according to the model shown in
<xref ref-type="fig" rid="F1">Figure 1</xref>
. For example, the definition of a candidate gene as ‘a gene that is co-localized (
<sc>l</sc>
gene–disease relationship) with a disease and that is orthologous to a gene that has similar functional annotation with that disease (
<sc>f</sc>
gene–disease relationship)’ can be represented using the ‘Gene’, ‘Disease’, ‘GO_term’ and ‘Ranking_Tool’ entities that are linked by the ‘Is_Orthologous_To’ and ‘Is_Ranked As’ relationships. The ‘Has-Value_in’ relationship expresses the
<sc>d</sc>
gene–disease relationship as a ratio between experimental values measured, for a given gene and a given experiment, in samples from diseased versus healthy patients. The relational logical data model presented in
<xref ref-type="table" rid="T1">Table 1</xref>
is derived from this conceptual model.
<table-wrap id="T1" position="float">
<label>Table 1.</label>
<caption>
<p>Relational logical data model for ACGR DB</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Table name</th>
<th rowspan="1" colspan="1">Attribute set</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Gene</td>
<td rowspan="1" colspan="1">
<bold>Gene_ID</bold>
, Symbol, Organism, Complete_name, Chromosome, Cytoband, OMIM_ID, Source_ID</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GO_Term</td>
<td rowspan="1" colspan="1">
<bold>GO_ID</bold>
, Term, GO_section, Definition</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gene_GO_Term</td>
<td rowspan="1" colspan="1">
<bold>Gene_ID, GO_ID</bold>
, Source_ID</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Orthology</td>
<td rowspan="1" colspan="1">
<bold>Gene_ID1, Gene_ID2</bold>
, Source_ID</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Interaction</td>
<td rowspan="1" colspan="1">
<bold>Gene_ID1, Gene_ID2</bold>
, Source_ID, Interaction_Type</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Disease</td>
<td rowspan="1" colspan="1">
<bold>Disease_ID</bold>
, Synopsis, OMIM_ID</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Disease_GO_Term</td>
<td rowspan="1" colspan="1">
<bold>Disease_ID, GO_ID, Author_ID</bold>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ranking_Tool</td>
<td rowspan="1" colspan="1">
<bold>Tool_ID</bold>
, Description, URL</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gene_Disease_Similarity</td>
<td rowspan="1" colspan="1">
<bold>Gene_ID, Disease_ID, Author_ID, Tool_ID</bold>
,Similarity</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Experiment</td>
<td rowspan="1" colspan="1">
<bold>Exp_ID</bold>
, Type, Date, Platform, Analysis_procedure, Disease_ID</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gene_In_Experiment</td>
<td rowspan="1" colspan="1">
<bold>Gene_ID, Exp_ID</bold>
, Ratio</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>It consists of a set of abbreviated table schemas. Each table contains a set of attributes including a primary key (in bold face) and one or more foreign keys (in italics).</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
</sec>
<sec id="SEC3.2">
<title>3.2 Populating the DB</title>
<p>On the basis of the relational data model, it is possible to specify the initialization steps of ACGR DB. Entering a disease description consists of inserting one row of data, hereafter called a tuple, into the ‘Disease’ table and several tuples in the ‘GO_Term’ and ‘Disease_GO_Term’ tables. To this aim, an expert of the studied disease has to carefully (i) extract from her knowledge and from OMIM the phenotypes which characterize the disease, (ii) associate keywords to these phenotypes and (iii) retrieve the most relevant GO terms corresponding to these keywords. The ‘Author_ID’ attribute is useful to distinguish different descriptions of the same disease. When available, experimental data are entered by inserting one tuple into the ‘Experiment’ table for each performed experiment, and several tuples into the ‘Gene’ and ‘Gene_In_Experiment’ tables, representing all signature genes and their dysregulation ratios. Finally, the system retrieves from public DBs all human, mouse and fly genes that are annotated by at least one GO term associated with the studied disease. Only gene identifiers are inserted into the ‘Gene’ table at the initialization stage.</p>
<p>The data collection process consists of first retrieving identifiers of human orthologs for mouse and fly genes and then retrieving all required annotations for all gene identifiers present in the ‘Gene’ table. In particular, interacting genes are retrieved and inserted into the ‘Interaction’ table. Identifiers for interacting genes which are not present in the ‘Gene’ table are then added and undergo their own data collection process. Nevertheless at this stage, interaction partners are omitted to prevent an explosion of relationships.</p>
<p>The specification of data wrappers implies selecting appropriate DBs (see
<xref ref-type="sec" rid="SEC4">Section 4</xref>
) and mapping the relevant fields onto the ACGR relational data model. Specific wrappers have been designed to plug in external ranking tools for calculating functional similarity values between genes and diseases. Such wrappers will insert tuples into the ‘Gene_Disease_Similarity’ table, i.e. one tuple per gene and per ranking tool.</p>
</sec>
<sec id="SEC3.3">
<title>3.3 Building sets of candidate genes</title>
<p>In order to express the candidate gene definitions, views are defined in Standard Query Language (SQL) at the logical level of our conceptual framework. A view associates an SQL query with a view name leading to the creation of a virtual table. We have selected four basic definitions leading to the four views described below. The corresponding SQL queries can be found in the
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btn612/DC1">Supplementary Material</ext-link>
. For the sake of readability, the datasets produced upon view execution are called ‘Datasets’.</p>
<p>
<italic>Dataset1: genes ranked according to their functional similarity with disease description</italic>
. This first view retrieves the gene symbol, species, cytogenetic localization and similarity of all ACGR DB genes, sorted by decreasing similarity value. Human, mouse and fly genes are thus collated according to their similarity with disease description. Mouse genes are often ranked better than their human orthologs because of the richer annotation in the model organism. The higher a gene is ranked in Dataset1, the stronger is the functional relationship with the disease.</p>
<p>
<italic>Dataset2: human orthologs of model organism genes listed in Dataset1</italic>
. This second view displays all features of Dataset1 for genes retrieved from model organisms (here, mouse and fly) together with the gene symbol, cytogenetic localization and similarity of their human orthologs. Good ranking of a mouse gene can pull its human ortholog to the top of Dataset2 when it was formerly at the bottom of Dataset1 because of poor GO annotation in human. This behaviour is observed, in the
<italic>CHD7</italic>
gene of CHARGE syndrome, for example (see subsequently).</p>
<p>
<italic>Dataset3: genes interacting with the genes listed in Dataset1.</italic>
For each gene in Dataset1, the symbol, cytogenetic localization and similarity of the genes reported as interacting with it (mostly via the gene products but other types of interactions are not excluded) are displayed. The source of information concerning these interactions is also displayed. Only intra-species interactions are listed here. Genes that display proper cytogenetic localization but poor similarity values may reveal good disease candidates because of interactions with well-ranked genes mapped elsewhere in the genome.</p>
<p>
<italic>Dataset4: human orthologs of model organism genes listed in Dataset3.</italic>
Dataset4 is intended to display candidate genes which are human orthologs of model organism genes that interact with well-ranked genes.</p>
<p>When experimental data are available, it can be included into each of the views described above, thereby producing four supplementary views: from Dataset1Exp to Dataset4Exp. An example of this is presented below in the case study on AICARDI syndrome.</p>
<p>Further queries on the basic ACGR views can then provide customized lists of candidate genes. Indeed, creating sets of annotated candidate genes as SQL views allow biologists to benefit from the numerous advantages of this powerful approach. First, writing new queries is simplified. Second, the views are automatically refreshed whenever the DB is updated. Finally, defining views contributes to the integrity and security of the DB because end-users may be given tuned privileges on views rather than on the underlying data tables.</p>
</sec>
</sec>
<sec id="SEC4">
<title>4 IMPLEMENTATION</title>
<p>The technical implementation choices described in this work are not mandatory since other techniques are conceivable depending on the target deployment environment. For example, here wrappers for retrieving and integrating data from various data sources have been implemented as scenarios of the Xcollect software (Devignes et al.,
<xref ref-type="bibr" rid="B9">2005</xref>
). Xcollect scenarios are configured to formulate queries automatically, send them to a remote web resource, parse the returned document and store the desired data in an XML document. Capturing the date of last DB update is included in each scenario to help track data quality. The specific Xcollect scenarios used here are available in the
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btn612/DC1">Supplementary Material</ext-link>
.</p>
<p>In this work, data sources were selected according to their updating frequencies, annotation quality and coverage. Thus, GO terms corresponding to keywords describing the disease were retrieved from AMIGO DB; genes annotated with selected GO terms were retrieved from Entrez-Gene at NCBI as well as all gene annotations. Symbols of orthologous genes were retrieved from Entrez-HomoloGene.</p>
<p>The storage of the collected data in the ACGR DB was performed with the help of XSL transformations designed to convert each Xcollect session document into appropriate SQL commands. Besides Xcollect wrappers, we developed a wrapper to invoke the GO-Family program available in the GOToolBox (Martin et al.,
<xref ref-type="bibr" rid="B20">2004</xref>
). The program was modified slightly because a list of GO terms rather than a list of reference gene symbols is required as well as the list of genes to be ranked. Briefly, the program fetches all GO terms annotating a candidate gene as well as their parent terms. It also fetches all parents of the disease-specific GO terms. Then it calculates a similarity percentage taking into account identical and non-identical terms between the set of GO terms associated with the candidate gene and the set of disease-specific GO terms.</p>
<p>The EasyPHP package was used for data management and user interface development. This package includes a web server (Apache), a DBMS (mySQL) and a script language (PHP). The corresponding programs along with a user guide are available at
<ext-link ext-link-type="uri" xlink:href="http://bioinfo.loria.fr/projects/acgr/acgr-software/">http://bioinfo.loria.fr/projects/acgr/acgr-software/</ext-link>
.</p>
</sec>
<sec sec-type="results|discussion" id="SEC5">
<title>5 RESULTS AND DISCUSSION</title>
<sec id="SEC5.1">
<title>5.1 Three case studies</title>
<p>The ACGR approach was initially motivated by the need to analyse results obtained for AICARDI syndrome (OMIM %304050) which is currently being investigated experimentally (Yilmaz et al.,
<xref ref-type="bibr" rid="B42">2007</xref>
). To date, no responsible gene is known for this disease. Two other rare syndromes, CHARGE (OMIM #214800) and GOLTZ (OMIM #305600), were selected from the literature. The genes responsible for these two syndromes have recently been reported (Grzeschik et al.,
<xref ref-type="bibr" rid="B13">2007</xref>
; Vissers et al.,
<xref ref-type="bibr" rid="B36">2004</xref>
; Wang et al.,
<xref ref-type="bibr" rid="B39">2007b</xref>
), but this information is not included in the annotations collected in the ACGR DB. It is therefore relevant to test the ACGR approach on these recently elucidated diseases.</p>
</sec>
<sec id="SEC5.2">
<title>5.2 Populating the DB</title>
<p>
<xref ref-type="table" rid="T2">Table 2</xref>
shows for the three case studies the correspondence between disease phenotypes and Biological Process GO terms. Phenotypes were selected from OMIM notices regarding diagnoses. Keywords (data not shown, see
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btn612/DC1">Supplementary Material</ext-link>
) were chosen to characterize each phenotype. For a given keyword, GO terms were selected at the relevant level of the GO hierarchy. A GO term is included when all its children are relevant. In the case of AICARDI syndrome, a third phenotype (infantile spasms) is frequently observed but does not correspond to any specific GO term. According to the clinicians, this phenotype is covered by the ‘Forebrain development’ GO term.
<table-wrap id="T2" position="float">
<label>Table 2.</label>
<caption>
<p>List of GO terms defined by the clinicians on August 31, 2007</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Syndrome</th>
<th rowspan="1" colspan="1">Phenotype</th>
<th rowspan="1" colspan="1">GO term (Biological Process hierarchy)</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Coloboma</td>
<td rowspan="1" colspan="1">Camera-type eye morphogenesis [47]</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Choanal atresia</td>
<td rowspan="1" colspan="1">Nose development [2]</td>
</tr>
<tr>
<td rowspan="1" colspan="1">CHARGE</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Embryonic cranial skeleton morphogenesis [16]</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Ear abnormality</td>
<td rowspan="1" colspan="1">Ear development [155]</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Deafness</td>
<td rowspan="1" colspan="1">Sensory perception of sound [203]</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Heart anomaly</td>
<td rowspan="1" colspan="1">Heart morphogenesis [69]</td>
</tr>
<tr>
<td colspan="3" rowspan="1">
<hr></hr>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Skin defects</td>
<td rowspan="1" colspan="1">Skin development [22]</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GOLTZ</td>
<td rowspan="1" colspan="1">Digital anomalies</td>
<td rowspan="1" colspan="1">Embryonic digit morphogenesis [28]</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Skeletal defects</td>
<td rowspan="1" colspan="1">Embryonic skeletal morphogenesis [25]</td>
</tr>
<tr>
<td colspan="3" rowspan="1">
<hr></hr>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Corpus callosum agenesis</td>
<td rowspan="1" colspan="1">Forebrain development [191]</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Corpus callosum development [0]</td>
</tr>
<tr>
<td rowspan="1" colspan="1">AICARDI</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Corpus callosum morphogenesis [0]</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Neuron migration [139]</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Neural plate development [117]</td>
</tr>
<tr>
<td colspan="3" rowspan="1">
<hr></hr>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Chorioretinal lacunae</td>
<td rowspan="1" colspan="1">Camera-type eye morphogenesis [47]</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The number of genes annotated by a GO term is indicated in brackets.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>Experimental data were inserted into the DB for the AICARDI syndrome as explained in
<xref ref-type="sec" rid="SEC3.2">Section 3.2</xref>
. These data concern 300 genes which ANOVA analysis of several transcriptomic experiments found to be dysregulated (Yilmaz,
<xref ref-type="bibr" rid="B41">2007</xref>
). For these genes the ratio attribute was set to 1; whereas, it was set to 0 for any other gene.</p>
<p>
<xref ref-type="table" rid="T3">Table 3</xref>
summarizes the contents of the ACGR DB for the three case studies. The
<italic>#</italic>
GO column displays the number of GO terms specific to the disease. The
<italic>#fly</italic>
, #mouse and
<italic>#human</italic>
columns show the number of genes annotated by at least one of these GO terms for each organism. The ‘#dysregulated’ column indicates the number of experimentally determined human dysregulated genes stored in the DB. The last column gives the total number of genes after the inclusion of other orthologous and interacting genes.
<table-wrap id="T3" position="float">
<label>Table 3.</label>
<caption>
<p>Numbers of genes stored in the ACGR DB for the three case studies</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Disease</th>
<th rowspan="1" colspan="1">#GO</th>
<th rowspan="1" colspan="1">#fly</th>
<th rowspan="1" colspan="1">#mouse</th>
<th rowspan="1" colspan="1">#human</th>
<th rowspan="1" colspan="1">#dysregulated</th>
<th rowspan="1" colspan="1">#genes</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">CHARGE</td>
<td rowspan="1" colspan="1">6</td>
<td rowspan="1" colspan="1">29</td>
<td rowspan="1" colspan="1">172</td>
<td rowspan="1" colspan="1">223</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">1410</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GOLTZ</td>
<td rowspan="1" colspan="1">3</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">55</td>
<td rowspan="1" colspan="1">272</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">1583</td>
</tr>
<tr>
<td rowspan="1" colspan="1">AICARDI</td>
<td rowspan="1" colspan="1">6</td>
<td rowspan="1" colspan="1">2</td>
<td rowspan="1" colspan="1">182</td>
<td rowspan="1" colspan="1">166</td>
<td rowspan="1" colspan="1">300</td>
<td rowspan="1" colspan="1">2218</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec id="SEC5.3">
<title>5.3 Building sets of annotated candidate genes</title>
<p>Dataset1 to Dataset4 were constructed for each case study as described in
<xref ref-type="sec" rid="SEC3.3">Section 3.3</xref>
to enable queries reflecting expert hypotheses about candidate genes to be formulated. The complete tables are available as
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/cgi/content/full/btn612/DC1">Supplementary Material</ext-link>
.
<xref ref-type="table" rid="T4">Table 4</xref>
displays the first three tuples from CHARGE Dataset2. The human
<italic>CHD7</italic>
gene that is responsible for this disease (Vissers et al.,
<xref ref-type="bibr" rid="B36">2004</xref>
) appears in second position as orthologous to the mouse Chd7 gene which has a high similarity to disease description (48%). It is worth noting that the low similarity of the human
<italic>CHD7</italic>
gene annotation to CHARGE GO terms (4%) relegates it to the bottom of Dataset1. Selecting human genes from chromosome 8 in CHARGE Dataset2 yields the
<italic>CHD7</italic>
gene as the first-ranked candidate gene.
<table-wrap id="T4" position="float">
<label>Table 4.</label>
<caption>
<p>The three top-ranked tuples from CHARGE Dataset2</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Symbol</th>
<th rowspan="1" colspan="1">Organism</th>
<th rowspan="1" colspan="1">Cyto-band</th>
<th rowspan="1" colspan="1">Sim</th>
<th rowspan="1" colspan="1">Orthol_Symbol</th>
<th rowspan="1" colspan="1">Orthol_Cytoband</th>
<th rowspan="1" colspan="1">Orthol_Sim</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Tmie</td>
<td rowspan="1" colspan="1">mouse</td>
<td rowspan="1" colspan="1">9 64.0 cM</td>
<td rowspan="1" colspan="1">62</td>
<td rowspan="1" colspan="1">TMIE</td>
<td rowspan="1" colspan="1">3p21</td>
<td rowspan="1" colspan="1">62</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Chd7</td>
<td rowspan="1" colspan="1">mouse</td>
<td rowspan="1" colspan="1">4 1.0 cM</td>
<td rowspan="1" colspan="1">48</td>
<td rowspan="1" colspan="1">CHD7</td>
<td rowspan="1" colspan="1">8q12.2</td>
<td rowspan="1" colspan="1">4</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gjb6</td>
<td rowspan="1" colspan="1">mouse</td>
<td rowspan="1" colspan="1">14 22.5 cM</td>
<td rowspan="1" colspan="1">48</td>
<td rowspan="1" colspan="1">GJB6</td>
<td rowspan="1" colspan="1">13q12</td>
<td rowspan="1" colspan="1">45</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The column
<italic>Sim</italic>
refers to the values taken by the
<italic>Similarity</italic>
attribute of the
<italic>Gene_Disease_Similarity</italic>
table. These values are expressed as percentages owing to the particular ranking tool that was used. The
<italic>Orthol_Symbol, Orthol_Cytoband</italic>
and
<italic>Orthol_Sim</italic>
columns display values for the human orthologs of the considered mouse genes.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>The CHARGE case study shows that the ACGR approach would have been able to designate the
<italic>CHD7</italic>
gene as the best candidate gene in the group of nine genes identified by the authors at 8q12 thus prioritizing its sequencing. It is worth noting that although the association of
<italic>CHD7</italic>
with CHARGE syndrome was established 3 years ago, the GO annotation of this gene does not reflect this association.</p>
<p>
<xref ref-type="table" rid="T5">Table 5</xref>
shows the first six tuples from GOLTZ Dataset4. Despite its low similarity to disease description (7%), the responsible human
<italic>PORCN</italic>
gene appears at the fifth position in GOLTZ Dataset4 that contains 51 lines and as the first candidate gene located on chromosome X. This is due to the fact that the mouse Porcn gene is reported as interacting with the mouse Wnt7a gene which has good similarity to the disease description. Hence the ACGR approach could have pointed to the
<italic>PORCN</italic>
gene even before the localization refinement of the disease provided by the CGH array experiment (Grzeschik et al.,
<xref ref-type="bibr" rid="B13">2007</xref>
; Wang et al.,
<xref ref-type="bibr" rid="B39">2007b</xref>
).
<table-wrap id="T5" position="float">
<label>Table 5.</label>
<caption>
<p>The six top-ranked tuples from GOLTZ Dataset4</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Symbol</th>
<th rowspan="1" colspan="1">Organism</th>
<th rowspan="1" colspan="1">Cytoband</th>
<th rowspan="1" colspan="1">Sim</th>
<th rowspan="1" colspan="1">Interac_ Symbol</th>
<th rowspan="1" colspan="1">Source</th>
<th rowspan="1" colspan="1">Interac_ Cytoband</th>
<th rowspan="1" colspan="1">Interac_Sim</th>
<th rowspan="1" colspan="1">Orthol_ Symbol</th>
<th rowspan="1" colspan="1">Orthol_Cytoband</th>
<th rowspan="1" colspan="1">Orthol_ Sim</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Gna12</td>
<td rowspan="1" colspan="1">mouse</td>
<td rowspan="1" colspan="1">5 82.0 cM</td>
<td rowspan="1" colspan="1">35</td>
<td rowspan="1" colspan="1">Ppp5c</td>
<td rowspan="1" colspan="1">BIND</td>
<td rowspan="1" colspan="1">7 4.0 cM</td>
<td rowspan="1" colspan="1">4</td>
<td rowspan="1" colspan="1">
<italic>PPP5C</italic>
</td>
<td rowspan="1" colspan="1">19q13.3</td>
<td rowspan="1" colspan="1">4</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Col5a2</td>
<td rowspan="1" colspan="1">mouse</td>
<td rowspan="1" colspan="1">1 C1</td>
<td rowspan="1" colspan="1">32</td>
<td rowspan="1" colspan="1">Smad2</td>
<td rowspan="1" colspan="1">BIND</td>
<td rowspan="1" colspan="1">18 48.0 cM</td>
<td rowspan="1" colspan="1">15</td>
<td rowspan="1" colspan="1">
<italic>SMAD2</italic>
</td>
<td rowspan="1" colspan="1">18q21.1</td>
<td rowspan="1" colspan="1">17</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Col5a2</td>
<td rowspan="1" colspan="1">mouse</td>
<td rowspan="1" colspan="1">1 C1</td>
<td rowspan="1" colspan="1">32</td>
<td rowspan="1" colspan="1">Smad7</td>
<td rowspan="1" colspan="1">BIND</td>
<td rowspan="1" colspan="1">18 unknown</td>
<td rowspan="1" colspan="1">5</td>
<td rowspan="1" colspan="1">
<italic>SMAD7</italic>
</td>
<td rowspan="1" colspan="1">18q21.1</td>
<td rowspan="1" colspan="1">5</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Col5a2</td>
<td rowspan="1" colspan="1">mouse</td>
<td rowspan="1" colspan="1">1 C1</td>
<td rowspan="1" colspan="1">32</td>
<td rowspan="1" colspan="1">Samd3</td>
<td rowspan="1" colspan="1">BIND</td>
<td rowspan="1" colspan="1">9 unknown</td>
<td rowspan="1" colspan="1">15</td>
<td rowspan="1" colspan="1">
<italic>SMAD3</italic>
</td>
<td rowspan="1" colspan="1">15q22.33</td>
<td rowspan="1" colspan="1">16</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Wnt7a</td>
<td rowspan="1" colspan="1">mouse</td>
<td rowspan="1" colspan="1">6 39.5 cM</td>
<td rowspan="1" colspan="1">31</td>
<td rowspan="1" colspan="1">Porcn</td>
<td rowspan="1" colspan="1">BIND</td>
<td rowspan="1" colspan="1">X 2.15 cM</td>
<td rowspan="1" colspan="1">5</td>
<td rowspan="1" colspan="1">
<italic>
<bold>PORCN</bold>
</italic>
</td>
<td rowspan="1" colspan="1">
<bold>Xp11.23</bold>
</td>
<td rowspan="1" colspan="1">7</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Lgals3</td>
<td rowspan="1" colspan="1">mouse</td>
<td rowspan="1" colspan="1">14 C1</td>
<td rowspan="1" colspan="1">28</td>
<td rowspan="1" colspan="1">Sufu</td>
<td rowspan="1" colspan="1">BIND</td>
<td rowspan="1" colspan="1">19 47.0 cM</td>
<td rowspan="1" colspan="1">17</td>
<td rowspan="1" colspan="1">
<italic>SUFU</italic>
</td>
<td rowspan="1" colspan="1">10q24.32</td>
<td rowspan="1" colspan="1">19</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Columns
<italic>Interac_Symbol</italic>
to
<italic>Interac_Sim</italic>
columns refer to the interacting genes of considered mouse genes. Columns
<italic>Orthol_Symbol</italic>
to
<italic>Orthol_Sim</italic>
are described in
<xref ref-type="table" rid="T4">Table 4</xref>
.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>In the case of AICARDI syndrome, Dataset1Exp to Dataset4Exp were produced including transcriptomic data. A first query on Dataset1Exp retrieved 71 genes located on human chromosome X.
<xref ref-type="table" rid="T6">Table 6</xref>
displays the first four genes of this list. The best-ranked
<italic>PLXNA3</italic>
gene seems to be an interesting candidate. Its annotation is rather similar to the AICARDI GO terms (56%). However, to date, it has not been associated with any human disease. The following
<italic>ARX</italic>
and
<italic>SOX3</italic>
genes, namely MRX54 (OMIM #300419) and MRGH (OMIM #300123), are both responsible for diseases involving mental retardation. The next
<italic>DCX</italic>
gene is a good internal control since it is responsible for X-linked lissencephaly (LISX, OMIM #300067), a disease-like AICARDI syndrome involving agenesis of the corpus callosum and multiple heterotopia.
<table-wrap id="T6" position="float">
<label>Table 6.</label>
<caption>
<p>The four top-ranked human genes localized on chromosome X from AICARDI Dataset1Exp</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Symbol</th>
<th rowspan="1" colspan="1">Organism</th>
<th rowspan="1" colspan="1">Sim</th>
<th rowspan="1" colspan="1">Cytoband</th>
<th rowspan="1" colspan="1">Ratio</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">PLXNA3</td>
<td rowspan="1" colspan="1">Human</td>
<td rowspan="1" colspan="1">56</td>
<td rowspan="1" colspan="1">Xq28</td>
<td rowspan="1" colspan="1">0</td>
</tr>
<tr>
<td rowspan="1" colspan="1">ARX</td>
<td rowspan="1" colspan="1">Human</td>
<td rowspan="1" colspan="1">40</td>
<td rowspan="1" colspan="1">Xp21</td>
<td rowspan="1" colspan="1">0</td>
</tr>
<tr>
<td rowspan="1" colspan="1">SOX3</td>
<td rowspan="1" colspan="1">Human</td>
<td rowspan="1" colspan="1">34</td>
<td rowspan="1" colspan="1">Xq27.1</td>
<td rowspan="1" colspan="1">0</td>
</tr>
<tr>
<td rowspan="1" colspan="1">DCX</td>
<td rowspan="1" colspan="1">Human</td>
<td rowspan="1" colspan="1">26</td>
<td rowspan="1" colspan="1">Xq22.3-q23</td>
<td rowspan="1" colspan="1">0</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>Further queries were applied to AICARDI Dataset3Exp to explore possible interactions between dysregulated genes and candidate genes.
<xref ref-type="table" rid="T7">Table 7</xref>
shows four candidate genes (‘Interac_Symbol’ column) from Dataset3Exp, located on chromosome X and interacting with the four best-ranked dysregulated genes (‘Symbol’ column). The
<italic>MAGED1</italic>
gene interacts with the
<italic>DLX5</italic>
gene which is dysregulated in our transcriptomic experiments and its GO annotation displays 50% similarity with the AICARDI-specific GO terms. The interaction between these two gene products is based on
<italic>in vivo</italic>
experiments (Masuda et al.,
<xref ref-type="bibr" rid="B23">2001</xref>
).
<table-wrap id="T7" position="float">
<label>Table 7.</label>
<caption>
<p>The four top-ranked human tuples from AICARDI Dataset3Exp</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Symbol</th>
<th rowspan="1" colspan="1">Cytoband</th>
<th rowspan="1" colspan="1">Sim</th>
<th rowspan="1" colspan="1">Ratio</th>
<th rowspan="1" colspan="1">Interac_Symbol</th>
<th rowspan="1" colspan="1">Interac_Cytoband</th>
<th rowspan="1" colspan="1">Interac_Sim</th>
<th rowspan="1" colspan="1">Interac_Ratio</th>
<th rowspan="1" colspan="1">Source</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">
<italic>DLX5</italic>
</td>
<td rowspan="1" colspan="1">7q22</td>
<td rowspan="1" colspan="1">50</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">
<italic>MAGED1</italic>
</td>
<td rowspan="1" colspan="1">Xp11.23</td>
<td rowspan="1" colspan="1">3</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">HPRD</td>
</tr>
<tr>
<td rowspan="1" colspan="1">UBE3A</td>
<td rowspan="1" colspan="1">15q11-q13</td>
<td rowspan="1" colspan="1">22</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">
<italic>UBQLN2</italic>
</td>
<td rowspan="1" colspan="1">Xp11.23-p11.1</td>
<td rowspan="1" colspan="1">8</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">HPRD</td>
</tr>
<tr>
<td rowspan="1" colspan="1">CXCL10</td>
<td rowspan="1" colspan="1">4q21</td>
<td rowspan="1" colspan="1">21</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">
<italic>CXCR3</italic>
</td>
<td rowspan="1" colspan="1">Xq13</td>
<td rowspan="1" colspan="1">10</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">HPRD</td>
</tr>
<tr>
<td rowspan="1" colspan="1">IGF1</td>
<td rowspan="1" colspan="1">12q22-q23</td>
<td rowspan="1" colspan="1">21</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">
<italic>IGSF1</italic>
</td>
<td rowspan="1" colspan="1">Xq25</td>
<td rowspan="1" colspan="1">6</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">BIND</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The columns
<italic>Symbol</italic>
to
<italic>Ratio</italic>
refer to dysregulated genes, and the columns
<italic>Interac_Symbol</italic>
to
<italic>Interac_Ratio</italic>
refer to the interacting candidate genes. The
<italic>Source</italic>
column indicates the database where the interaction is documented.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
</sec>
<sec id="SEC5.4">
<title>5.4 Discussion</title>
<p>Overall, the ACGR approach has received enthusiastic feedback from experimentalists. Indeed conducted experiments yielded very satisfying results in the CHARGE and GOLTZ case studies. We have shown that in both cases responsible genes related to the disease are found at the first rank position when chromosome localization is taken into account. Thus, the ACGR approach would have been useful at the time of the discovery of these responsible genes to avoid unnecessary sequencing. In the case of AICARDI syndrome, the ACGR approach provided several meaningful and promising candidate genes that are currently being analysed further. For instance, the
<italic>MAGED1</italic>
gene displays several features associated with disease genes (Tu et al.,
<xref ref-type="bibr" rid="B32">2006</xref>
). It is a 99.3 kb long gene due to a large intron (91 kb) separating the first exon from the 12 other exons that are grouped over the remaining 8 kb. Interestingly, two of the retrieved candidate genes (
<italic>MAGED1</italic>
and
<italic>UBQLN2</italic>
) are located in the same cytogenetic band (Xp11.23), which is known to be correlated with several neuro-psychiatric disorders. It should be noted that for this disease, the small number of recruited patients hampers the application of purely experimental protocols. In addition to the presented case studies, ongoing investigations indicate that the approach presented here may facilitate future endeavours to identify susceptibility genes for complex diseases.</p>
<p>The robustness and flexibility of our approach makes it possible to explore various alternative approaches or strategies, including varying the ranking procedure and the selection of primary data sources. For example, data about interaction networks could be retrieved from the protein complexes curated by Lage et al. (
<xref ref-type="bibr" rid="B16">2007</xref>
). The GO-Family algorithm used for gene ranking in this study could be replaced by any other similarity measurement between GO terms (Lord et al.,
<xref ref-type="bibr" rid="B19">2003</xref>
; Wang et al.,
<xref ref-type="bibr" rid="B38">2007a</xref>
; Zhang et al.,
<xref ref-type="bibr" rid="B43">2006</xref>
). The similarity between eVOC terms annotating both gene expression and affected tissues could be used to assess ‘is_co-expressed’ relationships (Tiffin et al.,
<xref ref-type="bibr" rid="B31">2005</xref>
), for example.</p>
<p>A possible limitation of the current work may be the low number of case studies analysed. Since an expert of each studied disease has to be involved in the first step of the approach, this clearly hampers automated large-scale evaluation. Moreover, it should be stressed that success in retrieving at a good rank the gene responsible for a disease strongly depends on both user's expertise and the quality of available data.</p>
<p>Nevertheless, the results presented here clearly demonstrate the explicit querying capabilities of the ACGR system and the originality of this approach for providing explanations on why a certain gene is related to a disease.</p>
</sec>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>[Supplementary Data]</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="btn612_index.html"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="msword" xlink:href="btn612_bioinf-2008-1029-File001.doc"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="x-zip-compressed" xlink:href="btn612_bioinf-2008-1029-File002.tar"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="x-zip-compressed" xlink:href="btn612_bioinf-2008-1029-File004.tar"></media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>ACKNOWLEDGEMENTS</title>
<p>We thank Sylvain Lambermont for his contribution at early stage of the work, Dr Leheup for helping in selecting disease-specific GO terms, Amine Rouhane-Hacène and Dave Ritchie for careful reading of the manuscript. S.Y. was supported by the AAL (Amis d'Anne-Lorène) association and Région Lorraine.</p>
<p>
<italic>Funding</italic>
: Contrat de Plan Etat-Région Lorraine (PRST Intelligence Logicielle).</p>
<p>
<italic>Conflict of Interest</italic>
: none declared.</p>
</ack>
<ref-list>
<title>REFERENCES</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Adie</surname>
<given-names>EA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Speeding disease gene discovery by sequence based candidate prioritization</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>55</fpage>
<pub-id pub-id-type="pmid">15766383</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Adie</surname>
<given-names>EA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>SUSPECTS: enabling fast and effective prioritization of positional candidates</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>773</fpage>
<lpage>774</lpage>
<pub-id pub-id-type="pmid">16423925</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aerts</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gene prioritization through genomic data fusion</article-title>
<source>Nat. Biotechnol.</source>
<year>2006</year>
<volume>24</volume>
<fpage>537</fpage>
<lpage>544</lpage>
<pub-id pub-id-type="pmid">16680138</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="book">
<collab>ANSI/X3/SPARC</collab>
<source>Study Group on Data Base Management Systems, Interim Report, FDT 7 No. 2.</source>
<year>1975</year>
<publisher-loc>New York</publisher-loc>
<publisher-name>ACM</publisher-name>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barillot</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>New strategy for the representation and the integration of biomolecular knowledge at a cellular scale</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<fpage>3581</fpage>
<lpage>3589</lpage>
<pub-id pub-id-type="pmid">15240831</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Botstein</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Risch</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease</article-title>
<source>Nat. Genet.</source>
<year>2003</year>
<volume>33(Suppl)</volume>
<fpage>228</fpage>
<lpage>236</lpage>
<pub-id pub-id-type="pmid">12610532</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Calvo</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A partially supervised classification approach to dominant and recessive human disease gene prediction</article-title>
<source>Comput. Methods Programs Biomed.</source>
<year>2007</year>
<volume>85</volume>
<fpage>229</fpage>
<lpage>237</lpage>
<pub-id pub-id-type="pmid">17258838</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chiang</surname>
<given-names>JH</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GeneLibrarian: an effective gene-information summarization and visualization system</article-title>
<source>BMC Bioinformatics</source>
<year>2006</year>
<volume>7</volume>
<fpage>392</fpage>
<pub-id pub-id-type="pmid">16939640</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Devignes</surname>
<given-names>MD</given-names>
</name>
<etal></etal>
</person-group>
<article-title>User-designed web services to support heterogeneous biological data retrieval. NETTAB workshop on Workflows management: new abilities for the biological information overflow</article-title>
<year>2005</year>
<comment>available at
<ext-link ext-link-type="uri" xlink:href="http://www.nettab.org/2005/progr.html">http://www.nettab.org/2005/progr.html</ext-link>
(last accessed date December 8, 2008)</comment>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Freudenberg</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Propping</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>A similarity-based method for genome-wide prediction of disease-relevant human genes</article-title>
<source>Bioinformatics</source>
<year>2002</year>
<volume>18</volume>
<issue>Suppl. 2</issue>
<fpage>S110</fpage>
<lpage>S115</lpage>
<pub-id pub-id-type="pmid">12385992</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>George</surname>
<given-names>RA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Analysis of protein sequence and interaction data for candidate disease gene prediction</article-title>
<source>Nucleic Acids Res.</source>
<year>2006</year>
<volume>34</volume>
<fpage>e130</fpage>
<pub-id pub-id-type="pmid">17020920</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Giallourakis</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Disease gene discovery through integrative genomics</article-title>
<source>Annu. Rev. Genomics Hum. Genet.</source>
<year>2005</year>
<volume>6</volume>
<fpage>381</fpage>
<lpage>406</lpage>
<pub-id pub-id-type="pmid">16124867</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grzeschik</surname>
<given-names>KH</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Deficiency of PORCN, a regulator of Wnt signaling, is associated with focal dermal hypoplasia</article-title>
<source>Nat. Genet.</source>
<year>2007</year>
<volume>39</volume>
<fpage>833</fpage>
<lpage>835</lpage>
<pub-id pub-id-type="pmid">17546031</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kent</surname>
<given-names>WJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Exploring relationships and mining data with the UCSC Gene Sorter</article-title>
<source>Genome Res.</source>
<year>2005</year>
<volume>15</volume>
<fpage>737</fpage>
<lpage>741</lpage>
<pub-id pub-id-type="pmid">15867434</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khatri</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Draghici</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Ontological analysis of gene expression data: current tools, limitations, and open problems</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>3587</fpage>
<lpage>3595</lpage>
<pub-id pub-id-type="pmid">15994189</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lage</surname>
<given-names>K</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A human phenome-interactome network of protein complexes implicated in genetic disorders</article-title>
<source>Nat. Biotechnol</source>
<year>2007</year>
<volume>285</volume>
<fpage>309</fpage>
<lpage>316</lpage>
<pub-id pub-id-type="pmid">17344885</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lopez-Bigas</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Ouzounis</surname>
<given-names>CA</given-names>
</name>
</person-group>
<article-title>Genome-wide identification of genes likely to be involved in human genetic disease</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>3108</fpage>
<lpage>3114</lpage>
<pub-id pub-id-type="pmid">15181176</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lopez-Bigas</surname>
<given-names>N</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Highly consistent patterns for inherited human diseases at the molecular level</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>269</fpage>
<lpage>277</lpage>
<pub-id pub-id-type="pmid">16287936</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lord</surname>
<given-names>PW</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<fpage>1275</fpage>
<lpage>1283</lpage>
<pub-id pub-id-type="pmid">12835272</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martin</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GOToolBox: functional analysis of gene datasets based on Gene Ontology</article-title>
<source>Genome Biol.</source>
<year>2004</year>
<volume>5</volume>
<fpage>R101</fpage>
<pub-id pub-id-type="pmid">15575967</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Masseroli</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>W293</fpage>
<lpage>W300</lpage>
<pub-id pub-id-type="pmid">15215397</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Masseroli</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists</article-title>
<source>Nucleic Acids Res.</source>
<year>2005</year>
<volume>33</volume>
<fpage>W717</fpage>
<lpage>W723</lpage>
<pub-id pub-id-type="pmid">15980570</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Masuda</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Dlxin-1, a novel protein that binds Dlx5 and regulates its transcriptional function</article-title>
<source>J. Biol. Chem.</source>
<year>2001</year>
<volume>276</volume>
<fpage>5331</fpage>
<lpage>5338</lpage>
<pub-id pub-id-type="pmid">11084035</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Oti</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Predicting disease genes using protein–protein interactions</article-title>
<source>J. Med. Genet.</source>
<year>2006</year>
<volume>43</volume>
<fpage>691</fpage>
<lpage>698</lpage>
<pub-id pub-id-type="pmid">16611749</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Perez-Iratxeta</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Association of genes to genetically inherited diseases using data mining</article-title>
<source>Nat. Genet.</source>
<year>2002</year>
<volume>31</volume>
<fpage>316</fpage>
<lpage>319</lpage>
<pub-id pub-id-type="pmid">12006977</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Perez-Iratxeta</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>G2D: a tool for mining genes associated with disease</article-title>
<source>BMC Genetics</source>
<year>2005</year>
<volume>6</volume>
<fpage>45</fpage>
<pub-id pub-id-type="pmid">16115313</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rossi</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>TOM: a web-based integrated approach for identification of candidate disease genes</article-title>
<source>Nucleic Acids Res.</source>
<year>2006</year>
<volume>34</volume>
<fpage>W285</fpage>
<lpage>W292</lpage>
<pub-id pub-id-type="pmid">16845011</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shaw-Smith</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Microarray based comparative genomic hybridisation (array-CGH) detects submicroscopic chromosomal deletions and duplications in patients with learning disability/mental retardation and dysmorphic features</article-title>
<source>J. Med. Genet.</source>
<year>2004</year>
<volume>41</volume>
<fpage>241</fpage>
<lpage>248</lpage>
<pub-id pub-id-type="pmid">15060094</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>H</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GOFFA: Gene Ontology for functional analysis – a FDA Gene Ontology tool for analysis of genomic and proteomic data</article-title>
<source>BMC Bioinformatics</source>
<year>2006</year>
<volume>7</volume>
<issue>Suppl. 2</issue>
<fpage>S23</fpage>
<pub-id pub-id-type="pmid">17118145</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Teorey</surname>
<given-names>TJ</given-names>
</name>
<etal></etal>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Cerra</surname>
<given-names>DD</given-names>
</name>
</person-group>
<source>Database Modeling and Design: Logical Design.</source>
<year>2006</year>
<publisher-loc>San Francisco</publisher-loc>
<publisher-name>Morgan Kaufmann Publishers</publisher-name>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tiffin</surname>
<given-names>N</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Integration of text- and data-mining using ontologies successfully selects disease gene candidates</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<fpage>1544</fpage>
<lpage>1552</lpage>
<pub-id pub-id-type="pmid">15767279</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tu</surname>
<given-names>Z</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Further understanding human disease genes by comparing with housekeeping genes and other genes</article-title>
<source>BMC Genomics</source>
<year>2006</year>
<volume>7</volume>
<fpage>31</fpage>
<pub-id pub-id-type="pmid">16504025</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turner</surname>
<given-names>FS</given-names>
</name>
<etal></etal>
</person-group>
<article-title>POCUS: mining genomic sequence annotation to predict disease genes</article-title>
<source>Genome Biol.</source>
<year>2003</year>
<volume>4</volume>
<fpage>R75</fpage>
<pub-id pub-id-type="pmid">14611661</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>van Driel</surname>
<given-names>MA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases</article-title>
<source>Nucleic Acids Res.</source>
<year>2005</year>
<volume>33</volume>
<fpage>W758</fpage>
<lpage>W761</lpage>
<pub-id pub-id-type="pmid">15980578</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vermeesch</surname>
<given-names>JR</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Guidelines for molecular karyotyping in constitutional genetic diagnosis</article-title>
<source>Eur. J. Hum. Genet</source>
<year>2007</year>
<volume>15</volume>
<fpage>1105</fpage>
<lpage>1114</lpage>
<pub-id pub-id-type="pmid">17637806</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vissers</surname>
<given-names>LE</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Mutations in a new member of the chromodomain gene family cause CHARGE syndrome</article-title>
<source>Nat. Genet.</source>
<year>2004</year>
<volume>36</volume>
<fpage>955</fpage>
<lpage>957</lpage>
<pub-id pub-id-type="pmid">15300250</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vissers</surname>
<given-names>LE</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Identification of disease genes by whole genome CGH arrays</article-title>
<source>Hum. Mol. Genet.</source>
<year>2005</year>
<volume>14</volume>
<fpage>R215</fpage>
<lpage>R223</lpage>
<pub-id pub-id-type="pmid">16244320</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>JZ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A new method to measure the semantic similarity of GO terms</article-title>
<source>Bioinformatics</source>
<year>2007a</year>
<volume>23</volume>
<fpage>1274</fpage>
<lpage>1281</lpage>
<pub-id pub-id-type="pmid">17344234</pub-id>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Mutations in X-linked PORCN, a putative regulator of Wnt signaling, cause focal dermal hypoplasia</article-title>
<source>Nat. Genet.</source>
<year>2007b</year>
<volume>39</volume>
<fpage>836</fpage>
<lpage>838</lpage>
<pub-id pub-id-type="pmid">17546030</pub-id>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Discovering disease-genes by topological features in human protein–protein interaction network</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>2800</fpage>
<lpage>2805</lpage>
<pub-id pub-id-type="pmid">16954137</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Yilmaz</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Searching Candidate Genes for AICARDI Syndrome : Combining Experimental Approach and Bioinformatics</article-title>
<source>PhD thesis.</source>
<year>2007</year>
<publisher-loc>Nancy 1</publisher-loc>
<publisher-name>Université Henri Poincaré</publisher-name>
</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yilmaz</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Screening of subtle copy number changes in Aicardi syndrome patients with a high resolution X chromosome array-CGH</article-title>
<source>Eur. J. Med. Genet</source>
<year>2007</year>
<volume>50</volume>
<fpage>386</fpage>
<lpage>391</lpage>
<pub-id pub-id-type="pmid">17625997</pub-id>
</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gene functional similarity search tool (GFSST)</article-title>
<source>BMC Bioinformatics</source>
<year>2006</year>
<volume>7</volume>
<fpage>135</fpage>
<pub-id pub-id-type="pmid">16536867</pub-id>
</citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000049  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000049  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022