Serveur d'exploration autour du libre accès en Belgique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations

Identifieur interne : 000400 ( Pmc/Corpus ); précédent : 000399; suivant : 000401

Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations

Auteurs : Steven Van Vooren ; Bernard Thienpont ; Björn Menten ; Frank Speleman ; Bart De Moor ; Joris Vermeesch ; Yves Moreau

Source :

RBID : PMC:1885641

Abstract

Biomedical literature provides a rich but unstructured source of associations between chromosomal regions and biomedical concepts. By mining MEDLINE abstracts, we annotate the human genome at the level of cytogenetic bands. Our method creates a set of chromosomal aberration maps that associate cytogenetic bands to biomedical concepts from a variety of controlled vocabularies, including disease, dysmorphology, anatomy, development and Gene Ontology branches. The association between a band (e.g. 4p16.3) and a concept (e.g. microcephaly) is assessed by the statistical overrepresentation of this concept in the abstracts relating to this band. Our method is validated using existing genome annotation resources and known chromosomal aberration maps and is further illustrated through a case study on heart disease. Our chromosomal aberration maps provide diagnostics support to clinical geneticists, aid cytogeneticists to interpret and report cytogenetic findings and support researchers interested in human gene function. The method is available as a web application, aBandApart, at http://www.esat.kuleuven.be/abandapart/.


Url:
DOI: 10.1093/nar/gkm054
PubMed: 17403693
PubMed Central: 1885641

Links to Exploration step

PMC:1885641

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations</title>
<author>
<name sortKey="Van Vooren, Steven" sort="Van Vooren, Steven" uniqKey="Van Vooren S" first="Steven" last="Van Vooren">Steven Van Vooren</name>
<affiliation>
<nlm:aff id="AFF1">Department of Electrotechnical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Thienpont, Bernard" sort="Thienpont, Bernard" uniqKey="Thienpont B" first="Bernard" last="Thienpont">Bernard Thienpont</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="AFF1">Center for Human Genetics, Leuven University Hospital, Herestraat 49, B-3000 Leuven, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Menten, Bjorn" sort="Menten, Bjorn" uniqKey="Menten B" first="Björn" last="Menten">Björn Menten</name>
<affiliation>
<nlm:aff id="AFF1">Center for Medical Genetics, Ghent University Hospital, MRB 2nd floor, De Pintelaan 185, B-9000 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Speleman, Frank" sort="Speleman, Frank" uniqKey="Speleman F" first="Frank" last="Speleman">Frank Speleman</name>
<affiliation>
<nlm:aff id="AFF1">Center for Medical Genetics, Ghent University Hospital, MRB 2nd floor, De Pintelaan 185, B-9000 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Moor, Bart De" sort="Moor, Bart De" uniqKey="Moor B" first="Bart De" last="Moor">Bart De Moor</name>
<affiliation>
<nlm:aff id="AFF1">Department of Electrotechnical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Vermeesch, Joris" sort="Vermeesch, Joris" uniqKey="Vermeesch J" first="Joris" last="Vermeesch">Joris Vermeesch</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="AFF1">Center for Human Genetics, Leuven University Hospital, Herestraat 49, B-3000 Leuven, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Moreau, Yves" sort="Moreau, Yves" uniqKey="Moreau Y" first="Yves" last="Moreau">Yves Moreau</name>
<affiliation>
<nlm:aff id="AFF1">Department of Electrotechnical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium,</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">17403693</idno>
<idno type="pmc">1885641</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1885641</idno>
<idno type="RBID">PMC:1885641</idno>
<idno type="doi">10.1093/nar/gkm054</idno>
<date when="2007">2007</date>
<idno type="wicri:Area/Pmc/Corpus">000400</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations</title>
<author>
<name sortKey="Van Vooren, Steven" sort="Van Vooren, Steven" uniqKey="Van Vooren S" first="Steven" last="Van Vooren">Steven Van Vooren</name>
<affiliation>
<nlm:aff id="AFF1">Department of Electrotechnical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Thienpont, Bernard" sort="Thienpont, Bernard" uniqKey="Thienpont B" first="Bernard" last="Thienpont">Bernard Thienpont</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="AFF1">Center for Human Genetics, Leuven University Hospital, Herestraat 49, B-3000 Leuven, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Menten, Bjorn" sort="Menten, Bjorn" uniqKey="Menten B" first="Björn" last="Menten">Björn Menten</name>
<affiliation>
<nlm:aff id="AFF1">Center for Medical Genetics, Ghent University Hospital, MRB 2nd floor, De Pintelaan 185, B-9000 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Speleman, Frank" sort="Speleman, Frank" uniqKey="Speleman F" first="Frank" last="Speleman">Frank Speleman</name>
<affiliation>
<nlm:aff id="AFF1">Center for Medical Genetics, Ghent University Hospital, MRB 2nd floor, De Pintelaan 185, B-9000 Ghent, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Moor, Bart De" sort="Moor, Bart De" uniqKey="Moor B" first="Bart De" last="Moor">Bart De Moor</name>
<affiliation>
<nlm:aff id="AFF1">Department of Electrotechnical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Vermeesch, Joris" sort="Vermeesch, Joris" uniqKey="Vermeesch J" first="Joris" last="Vermeesch">Joris Vermeesch</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="AFF1">Center for Human Genetics, Leuven University Hospital, Herestraat 49, B-3000 Leuven, Belgium</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Moreau, Yves" sort="Moreau, Yves" uniqKey="Moreau Y" first="Yves" last="Moreau">Yves Moreau</name>
<affiliation>
<nlm:aff id="AFF1">Department of Electrotechnical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium,</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Nucleic Acids Research</title>
<idno type="ISSN">0305-1048</idno>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Biomedical literature provides a rich but unstructured source of associations between chromosomal regions and biomedical concepts. By mining MEDLINE abstracts, we annotate the human genome at the level of cytogenetic bands. Our method creates a set of chromosomal aberration maps that associate cytogenetic bands to biomedical concepts from a variety of controlled vocabularies, including disease, dysmorphology, anatomy, development and Gene Ontology branches. The association between a band (e.g. 4p16.3) and a concept (e.g. microcephaly) is assessed by the statistical overrepresentation of this concept in the abstracts relating to this band. Our method is validated using existing genome annotation resources and known chromosomal aberration maps and is further illustrated through a case study on heart disease. Our chromosomal aberration maps provide diagnostics support to clinical geneticists, aid cytogeneticists to interpret and report cytogenetic findings and support researchers interested in human gene function. The method is available as a web application, aBandApart, at
<ext-link ext-link-type="uri" xlink:href="http://www.esat.kuleuven.be/abandapart/">http://www.esat.kuleuven.be/abandapart/</ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Brewer, C" uniqKey="Brewer C">C Brewer</name>
</author>
<author>
<name sortKey="Holloway, S" uniqKey="Holloway S">S Holloway</name>
</author>
<author>
<name sortKey="Zawalnyski, P" uniqKey="Zawalnyski P">P Zawalnyski</name>
</author>
<author>
<name sortKey="Schinzel, A" uniqKey="Schinzel A">A Schinzel</name>
</author>
<author>
<name sortKey="Fitzpatrick, D" uniqKey="Fitzpatrick D">D FitzPatrick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brewer, C" uniqKey="Brewer C">C Brewer</name>
</author>
<author>
<name sortKey="Holloway, C" uniqKey="Holloway C">C Holloway</name>
</author>
<author>
<name sortKey="Zawalnyski, P" uniqKey="Zawalnyski P">P Zawalnyski</name>
</author>
<author>
<name sortKey="Schinzel, A" uniqKey="Schinzel A">A Schinzel</name>
</author>
<author>
<name sortKey="Fitzpatrick, D" uniqKey="Fitzpatrick D">D FitzPatrick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schinzel, A" uniqKey="Schinzel A">A Schinzel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Perez Iratxeta, C" uniqKey="Perez Iratxeta C">C Perez-Iratxeta</name>
</author>
<author>
<name sortKey="Wjst, M" uniqKey="Wjst M">M Wjst</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
<author>
<name sortKey="Andrade, Ma" uniqKey="Andrade M">MA Andrade</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoffmann, R" uniqKey="Hoffmann R">R Hoffmann</name>
</author>
<author>
<name sortKey="Dopazo, J" uniqKey="Dopazo J">J Dopazo</name>
</author>
<author>
<name sortKey="Cigudosa, Jc" uniqKey="Cigudosa J">JC Cigudosa</name>
</author>
<author>
<name sortKey="Valencia, A" uniqKey="Valencia A">A Valencia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Korbel, Jo" uniqKey="Korbel J">JO Korbel</name>
</author>
<author>
<name sortKey="Doerks, T" uniqKey="Doerks T">T Doerks</name>
</author>
<author>
<name sortKey="Jensen, Lj" uniqKey="Jensen L">LJ Jensen</name>
</author>
<author>
<name sortKey="Perez Iratxeta, C" uniqKey="Perez Iratxeta C">C Perez-Iratxeta</name>
</author>
<author>
<name sortKey="Kaczanowski, S" uniqKey="Kaczanowski S">S Kaczanowski</name>
</author>
<author>
<name sortKey="Hooper, Sd" uniqKey="Hooper S">SD Hooper</name>
</author>
<author>
<name sortKey="Andrade, Ma" uniqKey="Andrade M">MA Andrade</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tiffin, N" uniqKey="Tiffin N">N Tiffin</name>
</author>
<author>
<name sortKey="Kelso, Jf" uniqKey="Kelso J">JF Kelso</name>
</author>
<author>
<name sortKey="Powell, Ar" uniqKey="Powell A">AR Powell</name>
</author>
<author>
<name sortKey="Pan, H" uniqKey="Pan H">H Pan</name>
</author>
<author>
<name sortKey="Bajic, Vb" uniqKey="Bajic V">VB Bajic</name>
</author>
<author>
<name sortKey="Hide, Wa" uniqKey="Hide W">WA Hide</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoffmann, R" uniqKey="Hoffmann R">R Hoffmann</name>
</author>
<author>
<name sortKey="Valencia, A" uniqKey="Valencia A">A Valencia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Driel, Ma" uniqKey="Van Driel M">MA van Driel</name>
</author>
<author>
<name sortKey="Bruggeman, J" uniqKey="Bruggeman J">J Bruggeman</name>
</author>
<author>
<name sortKey="Vriend, G" uniqKey="Vriend G">G Vriend</name>
</author>
<author>
<name sortKey="Brunner, Hg" uniqKey="Brunner H">HG Brunner</name>
</author>
<author>
<name sortKey="Leunissen, Jam" uniqKey="Leunissen J">JAM Leunissen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Driel, Ma" uniqKey="Van Driel M">MA van Driel</name>
</author>
<author>
<name sortKey="Cuelenaere, K" uniqKey="Cuelenaere K">K Cuelenaere</name>
</author>
<author>
<name sortKey="Kemmeren, Ppcw" uniqKey="Kemmeren P">PPCW Kemmeren</name>
</author>
<author>
<name sortKey="Leunissen, Jam" uniqKey="Leunissen J">JAM Leunissen</name>
</author>
<author>
<name sortKey="Brunner, Hg" uniqKey="Brunner H">HG Brunner</name>
</author>
<author>
<name sortKey="Vriend, G" uniqKey="Vriend G">G Vriend</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Masseroli, M" uniqKey="Masseroli M">M Masseroli</name>
</author>
<author>
<name sortKey="Galati, O" uniqKey="Galati O">O Galati</name>
</author>
<author>
<name sortKey="Pinciroli, F" uniqKey="Pinciroli F">F Pinciroli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hatcher, E" uniqKey="Hatcher E">E Hatcher</name>
</author>
<author>
<name sortKey="Gospodneti, O" uniqKey="Gospodneti O">O Gospodnetić</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shaffer, Lg" uniqKey="Shaffer L">LG Shaffer</name>
</author>
<author>
<name sortKey="Tommerup, N" uniqKey="Tommerup N">N Tommerup</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Levan, G" uniqKey="Levan G">G Levan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aerts, S" uniqKey="Aerts S">S Aerts</name>
</author>
<author>
<name sortKey="Lambrechts, D" uniqKey="Lambrechts D">D Lambrechts</name>
</author>
<author>
<name sortKey="Maity, S" uniqKey="Maity S">S Maity</name>
</author>
<author>
<name sortKey="Van Loo, P" uniqKey="Van Loo P">P Van Loo</name>
</author>
<author>
<name sortKey="Coessens, B" uniqKey="Coessens B">B Coessens</name>
</author>
<author>
<name sortKey="De Smet, F" uniqKey="De Smet F">F De Smet</name>
</author>
<author>
<name sortKey="Tranchevent, L C" uniqKey="Tranchevent L">L.-C Tranchevent</name>
</author>
<author>
<name sortKey="De Moor, B" uniqKey="De Moor B">B De Moor</name>
</author>
<author>
<name sortKey="Marynen, P" uniqKey="Marynen P">P Marynen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Glenisson, P" uniqKey="Glenisson P">P Glenisson</name>
</author>
<author>
<name sortKey="Coessens, B" uniqKey="Coessens B">B Coessens</name>
</author>
<author>
<name sortKey="Van Vooren, S" uniqKey="Van Vooren S">S Van Vooren</name>
</author>
<author>
<name sortKey="Mathys, J" uniqKey="Mathys J">J Mathys</name>
</author>
<author>
<name sortKey="Moreau, Y" uniqKey="Moreau Y">Y Moreau</name>
</author>
<author>
<name sortKey="De Moor, B" uniqKey="De Moor B">B De Moor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mohnish, S" uniqKey="Mohnish S">S Mohnish</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hunter, A" uniqKey="Hunter A">A Hunter</name>
</author>
<author>
<name sortKey="Kaufman, Mh" uniqKey="Kaufman M">MH Kaufman</name>
</author>
<author>
<name sortKey="Mckay, A" uniqKey="Mckay A">A McKay</name>
</author>
<author>
<name sortKey="Baldock, R" uniqKey="Baldock R">R Baldock</name>
</author>
<author>
<name sortKey="Simmen, Mw" uniqKey="Simmen M">MW Simmen</name>
</author>
<author>
<name sortKey="Bard, Jbl" uniqKey="Bard J">JBL Bard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Khatri, P" uniqKey="Khatri P">P Khatri</name>
</author>
<author>
<name sortKey="Draghici, S" uniqKey="Draghici S">S Draghici</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Falcon, S" uniqKey="Falcon S">S Falcon</name>
</author>
<author>
<name sortKey="Gentleman, R" uniqKey="Gentleman R">R Gentleman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Al Shahrour, F" uniqKey="Al Shahrour F">F Al-Shahrour</name>
</author>
<author>
<name sortKey="Diaz Uriarte, R" uniqKey="Diaz Uriarte R">R Diaz-Uriarte</name>
</author>
<author>
<name sortKey="Dopazo, J" uniqKey="Dopazo J">J Dopazo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, Jsm" uniqKey="Lee J">JSM Lee</name>
</author>
<author>
<name sortKey="Katari, G" uniqKey="Katari G">G Katari</name>
</author>
<author>
<name sortKey="Sachidanandam, R" uniqKey="Sachidanandam R">R Sachidanandam</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zeeberg, Br" uniqKey="Zeeberg B">BR Zeeberg</name>
</author>
<author>
<name sortKey="Feng, W" uniqKey="Feng W">W Feng</name>
</author>
<author>
<name sortKey="Wang, G" uniqKey="Wang G">G Wang</name>
</author>
<author>
<name sortKey="Wang, Md" uniqKey="Wang M">MD Wang</name>
</author>
<author>
<name sortKey="Fojo, At" uniqKey="Fojo A">AT Fojo</name>
</author>
<author>
<name sortKey="Sunshine, M" uniqKey="Sunshine M">M Sunshine</name>
</author>
<author>
<name sortKey="Narasimhan, S" uniqKey="Narasimhan S">S Narasimhan</name>
</author>
<author>
<name sortKey="Kane, Dw" uniqKey="Kane D">DW Kane</name>
</author>
<author>
<name sortKey="Reinhold, Wc" uniqKey="Reinhold W">WC Reinhold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Martin, D" uniqKey="Martin D">D Martin</name>
</author>
<author>
<name sortKey="Brun, C" uniqKey="Brun C">C Brun</name>
</author>
<author>
<name sortKey="Remy, E" uniqKey="Remy E">E Remy</name>
</author>
<author>
<name sortKey="Mouren, P" uniqKey="Mouren P">P Mouren</name>
</author>
<author>
<name sortKey="Thieffry, D" uniqKey="Thieffry D">D Thieffry</name>
</author>
<author>
<name sortKey="Jacq, B" uniqKey="Jacq B">B Jacq</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Castillo Davis, Ci" uniqKey="Castillo Davis C">CI Castillo-Davis</name>
</author>
<author>
<name sortKey="Hartl, Dl" uniqKey="Hartl D">DL Hartl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, B" uniqKey="Zhang B">B Zhang</name>
</author>
<author>
<name sortKey="Schmoyer, D" uniqKey="Schmoyer D">D Schmoyer</name>
</author>
<author>
<name sortKey="Kirov, S" uniqKey="Kirov S">S Kirov</name>
</author>
<author>
<name sortKey="Snoddy, J" uniqKey="Snoddy J">J Snoddy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Young, A" uniqKey="Young A">A Young</name>
</author>
<author>
<name sortKey="Whitehouse, N" uniqKey="Whitehouse N">N Whitehouse</name>
</author>
<author>
<name sortKey="Cho, J" uniqKey="Cho J">J Cho</name>
</author>
<author>
<name sortKey="Shaw, C" uniqKey="Shaw C">C Shaw</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wrobel, G" uniqKey="Wrobel G">G Wrobel</name>
</author>
<author>
<name sortKey="Chalmel, F" uniqKey="Chalmel F">F Chalmel</name>
</author>
<author>
<name sortKey="Primig, M" uniqKey="Primig M">M Primig</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Doerge, Rw" uniqKey="Doerge R">RW Doerge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barriot, R" uniqKey="Barriot R">R Barriot</name>
</author>
<author>
<name sortKey="Poix, J" uniqKey="Poix J">J Poix</name>
</author>
<author>
<name sortKey="Groppi, A" uniqKey="Groppi A">A Groppi</name>
</author>
<author>
<name sortKey="Goffard, N" uniqKey="Goffard N">N Goffard</name>
</author>
<author>
<name sortKey="Sherman, D" uniqKey="Sherman D">D Sherman</name>
</author>
<author>
<name sortKey="Dutour, I" uniqKey="Dutour I">I Dutour</name>
</author>
<author>
<name sortKey="De Daruvar, A" uniqKey="De Daruvar A">A de Daruvar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yakut, T" uniqKey="Yakut T">T Yakut</name>
</author>
<author>
<name sortKey="Kilic, Ss" uniqKey="Kilic S">SS Kilic</name>
</author>
<author>
<name sortKey="Cil, E" uniqKey="Cil E">E Cil</name>
</author>
<author>
<name sortKey="Yapici, E" uniqKey="Yapici E">E Yapici</name>
</author>
<author>
<name sortKey="Egeli, U" uniqKey="Egeli U">U Egeli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krantz, Id" uniqKey="Krantz I">ID Krantz</name>
</author>
<author>
<name sortKey="Smith, R" uniqKey="Smith R">R Smith</name>
</author>
<author>
<name sortKey="Colliton, Rp" uniqKey="Colliton R">RP Colliton</name>
</author>
<author>
<name sortKey="Tinkel, H" uniqKey="Tinkel H">H Tinkel</name>
</author>
<author>
<name sortKey="Zackai, Eh" uniqKey="Zackai E">EH Zackai</name>
</author>
<author>
<name sortKey="Piccoli, Da" uniqKey="Piccoli D">DA Piccoli</name>
</author>
<author>
<name sortKey="Goldmuntz, E" uniqKey="Goldmuntz E">E Goldmuntz</name>
</author>
<author>
<name sortKey="Spinner, Nb" uniqKey="Spinner N">NB Spinner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kosaki, R" uniqKey="Kosaki R">R Kosaki</name>
</author>
<author>
<name sortKey="Kosaki, K" uniqKey="Kosaki K">K Kosaki</name>
</author>
<author>
<name sortKey="Matsushima, K" uniqKey="Matsushima K">K Matsushima</name>
</author>
<author>
<name sortKey="Mitsui, N" uniqKey="Mitsui N">N Mitsui</name>
</author>
<author>
<name sortKey="Matsumoto, N" uniqKey="Matsumoto N">N Matsumoto</name>
</author>
<author>
<name sortKey="Ohashi, H" uniqKey="Ohashi H">H Ohashi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Robinson, Wp" uniqKey="Robinson W">WP Robinson</name>
</author>
<author>
<name sortKey="Waslynka, J" uniqKey="Waslynka J">J Waslynka</name>
</author>
<author>
<name sortKey="Bernasconi, F" uniqKey="Bernasconi F">F Bernasconi</name>
</author>
<author>
<name sortKey="Wang, M" uniqKey="Wang M">M Wang</name>
</author>
<author>
<name sortKey="Clark, S" uniqKey="Clark S">S Clark</name>
</author>
<author>
<name sortKey="Kotzot, D" uniqKey="Kotzot D">D Kotzot</name>
</author>
<author>
<name sortKey="Schinzel, A" uniqKey="Schinzel A">A Schinzel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wren, Jd" uniqKey="Wren J">JD Wren</name>
</author>
<author>
<name sortKey="Hildebrand, Wh" uniqKey="Hildebrand W">WH Hildebrand</name>
</author>
<author>
<name sortKey="Chandrasekaran, S" uniqKey="Chandrasekaran S">S Chandrasekaran</name>
</author>
<author>
<name sortKey="Ulrich Melcher, U" uniqKey="Ulrich Melcher U">U Ulrich Melcher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shah, Pk" uniqKey="Shah P">PK Shah</name>
</author>
<author>
<name sortKey="Perez Iratxeta, C" uniqKey="Perez Iratxeta C">C Perez-Iratxeta</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
<author>
<name sortKey="Andrade, Ma" uniqKey="Andrade M">MA Andrade</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="iso-abbrev">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="pmc">nar</journal-id>
<journal-id journal-id-type="publisher-id">Nucleic Acids Research</journal-id>
<journal-title-group>
<journal-title>Nucleic Acids Research</journal-title>
</journal-title-group>
<issn pub-type="ppub">0305-1048</issn>
<issn pub-type="epub">1362-4962</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">17403693</article-id>
<article-id pub-id-type="pmc">1885641</article-id>
<article-id pub-id-type="doi">10.1093/nar/gkm054</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Computational Biology</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Van Vooren</surname>
<given-names>Steven</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="COR1">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Thienpont</surname>
<given-names>Bernard</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Menten</surname>
<given-names>Björn</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Speleman</surname>
<given-names>Frank</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Moor</surname>
<given-names>Bart De</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Vermeesch</surname>
<given-names>Joris</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Moreau</surname>
<given-names>Yves</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group>
<aff id="AFF1">
<sup>1</sup>
Department of Electrotechnical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium,
<sup>2</sup>
Center for Human Genetics, Leuven University Hospital, Herestraat 49, B-3000 Leuven, Belgium and
<sup>3</sup>
Center for Medical Genetics, Ghent University Hospital, MRB 2nd floor, De Pintelaan 185, B-9000 Ghent, Belgium</aff>
<author-notes>
<corresp id="COR1">*To whom correspondence should be addressed
<phone>+3216328654</phone>
<fax>+3216321970</fax>
<email>steven.vanvooren@esat.kuleuven.be</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>4</month>
<year>2007</year>
</pub-date>
<pub-date pub-type="epub">
<day>1</day>
<month>4</month>
<year>2007</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>1</day>
<month>4</month>
<year>2007</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>35</volume>
<issue>8</issue>
<fpage>2533</fpage>
<lpage>2543</lpage>
<history>
<date date-type="received">
<day>3</day>
<month>10</month>
<year>2006</year>
</date>
<date date-type="rev-recd">
<day>27</day>
<month>10</month>
<year>2006</year>
</date>
<date date-type="accepted">
<day>18</day>
<month>1</month>
<year>2007</year>
</date>
</history>
<permissions>
<copyright-statement>© 2007 The Author(s)</copyright-statement>
<copyright-year>2007</copyright-year>
<license license-type="open-access">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/2.0/uk/">http://creativecommons.org/licenses/by-nc/2.0/uk/</ext-link>
) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Biomedical literature provides a rich but unstructured source of associations between chromosomal regions and biomedical concepts. By mining MEDLINE abstracts, we annotate the human genome at the level of cytogenetic bands. Our method creates a set of chromosomal aberration maps that associate cytogenetic bands to biomedical concepts from a variety of controlled vocabularies, including disease, dysmorphology, anatomy, development and Gene Ontology branches. The association between a band (e.g. 4p16.3) and a concept (e.g. microcephaly) is assessed by the statistical overrepresentation of this concept in the abstracts relating to this band. Our method is validated using existing genome annotation resources and known chromosomal aberration maps and is further illustrated through a case study on heart disease. Our chromosomal aberration maps provide diagnostics support to clinical geneticists, aid cytogeneticists to interpret and report cytogenetic findings and support researchers interested in human gene function. The method is available as a web application, aBandApart, at
<ext-link ext-link-type="uri" xlink:href="http://www.esat.kuleuven.be/abandapart/">http://www.esat.kuleuven.be/abandapart/</ext-link>
.</p>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>INTRODUCTION</title>
<p>Forward genetics, i.e. identification of gene mutations that underlie a phenotype of interest in a particular individual, is a key strategy to characterize gene function. In humans, where mutagenesis screens are impossible, genomic information from patients with developmental disorders can serve as the basis for disease gene discovery. Different positional cloning strategies, such as cytogenetic studies and linkage and association studies, can subsequently identify the chromosomal region where the disease gene is located.</p>
<p>To speed up the process of gene discovery, some attempts have been made to associate genomic rearrangements (such as subchromosomal deletions and duplications) to congenital malformations based on clinical and cytogenetic information from patients. Brewer
<italic>et al</italic>
. analyzed detailed clinical and cytogenetic information associated to a large number of autosomal deletions (1) and duplications (2) to construct a chromosome map showing associations of congenital malformations and chromosomal regions. Notably, these maps have not been updated since their publication.</p>
<p>Research groups with an interest in the etiology of, for example, congenital malformations often lack an extensive pool of patients to conduct large and informative association studies. Several public and private databases are being constructed to support such efforts by aggregating case reports and encouraging the exchange of patient information to complement private patient pools. Examples are the Catalogue of Unbalanced Chromosome Aberration in Man (3), the Human Cytogenetics Database and ECARUCA (
<monospace>
<ext-link ext-link-type="uri" xlink:href="www.ecaruca.net">www.ecaruca.net</ext-link>
</monospace>
), DECIPHER (
<monospace>
<ext-link ext-link-type="uri" xlink:href="decipher.sanger.ac.uk">decipher.sanger.ac.uk</ext-link>
</monospace>
), the Chromosome Anomaly Collection (
<monospace>
<ext-link ext-link-type="uri" xlink:href="www.som.soton.ac.uk/research/geneticsdiv/">www.som.soton.ac.uk/research/geneticsdiv/</ext-link>
</monospace>
), the Mitelman Database of Chromosome Aberrations in Cancer (
<monospace>
<ext-link ext-link-type="uri" xlink:href="cgap.nci.nih.gov/Chromosomes/Mitelman">cgap.nci.nih.gov/Chromosomes/Mitelman</ext-link>
</monospace>
), the Mendelian Cytogenetics Network DataBase (
<monospace>
<ext-link ext-link-type="uri" xlink:href="www.mcndb.org">www.mcndb.org</ext-link>
</monospace>
), Orphanet (
<monospace>
<ext-link ext-link-type="uri" xlink:href="www.orpha.net">www.orpha.net</ext-link>
</monospace>
), etc. These efforts differ in setup but aim at aggregating chromosomal aberration information and charting phenotypes and case reports. Some catalogs are available only in print or at a licence fee, other databases require registration. Others are open and searchable by the public, but include no specific means for data mining.</p>
<p>The information available in the public corpus of biomedical literature is a powerful alternative resource for patient reports and cytogenetic findings to conduct association studies. This corpus can be seen as a
<italic>de facto</italic>
genotype–phenotype association database. Moreover, it is not limited to case reports listing congenital malformations. Apart from disease related concepts, it is a rich source of information with regard to anatomy and development, systems and tissues and molecular functions and biological processes as well.</p>
<p>We have developed a method to automatically create chromosomal aberration maps from MEDLINE abstracts that mention (ranges of) cytogenetic bands. Through the use of multiple structured vocabularies, association with a band is not limited to a disease or syndrome, but also covers dysmorphology, human development and cell biology. The online application built on this method forms a bridge to the relevant and most current literature for further analysis by the researcher, rather than merely providing a catalog of genotype–phenotype associations. It thereby facilitates studies in the etiology of disease and the identification of disease genes. This resource is freely accessible and will stay up-to-date through regular automatic updates.</p>
<sec>
<title>Related literature-mining methods</title>
<p>A number of tools and methods are currently available and offer capabilities for mining associations from literature between disease and genomic locations, although none have a scope identical to our method.</p>
<p>G2D (4) is a method for the prioritization of genes according to their relation to inherited disease. It allows a user to enter an OMIM disease identifier and a genomic region of interest. Through sequence and biomedical database analysis, G2D then identifies genes potentially associated with the disease.</p>
<p>HCAD (5) (Human Chromosome Aberration Database) is a web-based text-mining tool supporting analysis of human breakpoint data by mining the scientific literature to generate information on all human breakpoints.</p>
<p>Korbel
<italic>et al</italic>
. mine MEDLINE to identify clusters of gene-phenotype associations based on information on prokaryotic genomes (6). The results are not available through a web interface.</p>
<p>Tiffin
<italic>et al</italic>
. use an anatomical ontology to integrate text mining of biomedical literature and data mining of available human gene expression data (7). Their method prioritizes candidate genes according to their expression in disease-affected tissues.</p>
<p>iHOP (8) (information Hyperlinked Over Proteins) uses genes and proteins from multiple organisms as hyperlinks between sentences and abstracts to access and navigate PubMed.</p>
<p>MimMiner (9) is restricted to mining the OMIM database and ranks related phenotypes for a given phenotype or OMIM identifier. GeneSeeker (10) is a related tool that aims at the identification of genes underlying human genetic disorders by combining data on cytogenetic locus, phenotypes and expression patterns, to generate a list of candidate genes.</p>
<p>GFINDer (11) mines text data present in OMIM to annotate genes with gene ontology concepts and statistically selects relevant annotation categories. Phenotype descriptions are normalized to handle synonymy and are hierarchically structured.</p>
<p>Our method relates to these approaches but differs in several aspects. First, instead of extracting MEDLINE references linked to OMIM entries, or mining only text present in OMIM, MEDLINE abstracts are directly mined for cytogenetic bands and biomedical concepts. While curated databases offer high quality annotations and hence reduce noise, the use of abstracts allows mining to be more complete and up to date. Second, gene prioritization tools like G2D build an internal representation for the disease or phenotype under study through the intermediate association of MeSH and GO terms. This allows relating genes to phenotypes by means of chemicals, molecules, etc. In our method, this internal association process is rendered explicit through the choice of controlled vocabularies that allow the user to elucidate overrepresented associations between loci and concepts. Third, most of these tools offer only a disease-specific approach (in some cases using other annotations internally) while aBandApart explicitly allows for additional perspectives or user interests, such as dysmorphology, anatomy, development, molecular function, etc.</p>
<p>ABandApart is a novel analysis method based on abstracts present in MEDLINE for cataloguing biomedical concepts according to their association with chromosomal bands, which can be considered as a cytogenetic approach to genotype–phenotype correlation. Rather than prioritizing candidate genes, it focuses on cytogenetic bands and offers a portal into relevant literature. Through its different approach and goal, it can be considered complimentary to tools that already exist.</p>
</sec>
</sec>
<sec sec-type="materials|methods">
<title>MATERIALS AND METHODS</title>
<p>Three elements are necessary to automatically build a chromosomal aberration map from MEDLINE abstracts: (1) identification of cytogenetic bands, (2) identification of concepts from multiple vocabularies and (3) assessment of the statistical overrepresentation of a concept among the abstracts relating to a band.</p>
<p>To discover overrepresented association between concepts and cytobands, we must first locate cytogenetic band identifiers and concepts from the vocabularies (and their synonyms) in the MEDLINE corpus. We have extended Lucene (12), a high-performance text-indexing engine written in Java, to parse all MEDLINE abstracts and extract cytogenetic bands, ranges of bands and biomedical concepts that are present in our different structured vocabularies.</p>
<sec>
<title>Identification of cytogenetic bands</title>
<p>The International System for Human Cytogenetic Nomenclature (ISCN) gives a universal terminology of the description of chromosomal anomalies based on cytogenetic staining techniques (13). This nomenclature guarantees that all chromosomal anomalies are reported in a standardized way. Hence, reports in literature typically mention bands to delineate a genomic region at various levels of cytogenetic resolution. Because of this specific nomenclature, bands can be unambiguously extracted from text in the majority of cases. A similar approach is adopted in HCAD, where the nomenclature for translocations is used.</p>
<p>Although band patterns delineate chromosomal regions at a less detailed resolution than markers, base-pair positions, BAC clone identifiers, or genes, this approach is advantageous because of its effectiveness. Indeed, in most cases, chromosomal deletions and duplications have so far been resolved and reported only if their size was of the order of a cytogenetic band. Also, more accurate identifiers of genomic location are not used frequently or consistently enough in abstracts to construct a large and reliable mapping between genomic location and literature.</p>
<p>A range is a delineation of consecutive cytobands, possibly even spanning a centromere. Whenever such a range is encountered in an abstract, all the intermediate cytobands are associated to the abstract as well. A custom ontology resolves all bands in a range: a document mentioning 1p21.2-q23.1 will be annotated to all bands in between. In addition, an association to a certain abstract is transferred from a certain cytoband upwards through different levels of cytogenetic resolutions. This implies documents mentioning 3q26.32 will be annotated to 3q26 as well.</p>
<p>Based on this premise, we constructed a map that links MEDLINE abstracts to cytogenetic bands. This highly specific map was then used to characterize individual cytogenetic bands based on the content of the abstracts they are linked to. As the contents of the literature indices underlying aBandApart are updated regularly, the validation is based on a version of the tool that was frozen at the state of MEDLINE on 6 September 2005. Within that MEDLINE corpus, we identified 36 092 abstracts mentioning at least one cytogenetic band or range of bands. From this set, 293 808 associations between bands and concepts were extracted. Nearly 60 000 publications are added to the MEDLINE corpus every month. Hence, the number of abstracts and associations is expected to grow steadily as the system is continuously brought up to date.</p>
<p>A potential source of concern for the text-mining algorithm is that man is not the only organism for which banding patterns can be discerned through cytogenetic staining. Band nomenclatures also exist for other organisms. Genome architecture differs among species, which implies that assertions on human genotype–phenotype correlations are contaminated by literature dealing with nonhuman organisms for which a similar band pattern nomenclature is used. To assess the importance of this problem, we need to know the prevalence of documents dealing with nonhuman species in our corpus. We considered the complete set of documents that mention one or more cytogenetic bands and indexed this set using a vocabulary of both common and scientific organism names based on English animal-related lists (nouns and adjectives), as well as the NCBI taxonomy (
<monospace>
<ext-link ext-link-type="uri" xlink:href="www.ncbi.nlm.nih.gov/Taxonomy/">www.ncbi.nlm.nih.gov/Taxonomy/</ext-link>
</monospace>
). From this vocabulary, 489 distinct terms and phrases were detected at least once in the document set. The most frequently occurring species are shown in
<xref ref-type="table" rid="T1">Table 1</xref>
.
<table-wrap id="T1" position="float">
<label>Table 1.</label>
<caption>
<p>The most frequently occurring species in a set of 36 082 cytogenetic MEDLINE abstracts mentioning cytogenetic bands</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Rank</th>
<th rowspan="1" colspan="1">Phrase</th>
<th rowspan="1" colspan="1">Rank</th>
<th rowspan="1" colspan="1">Phrase</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">14 865</td>
<td rowspan="1" colspan="1">Human</td>
<td rowspan="1" colspan="1">126</td>
<td rowspan="1" colspan="1">Pig</td>
</tr>
<tr>
<td rowspan="1" colspan="1">3664</td>
<td rowspan="1" colspan="1">Mouse</td>
<td rowspan="1" colspan="1">107</td>
<td rowspan="1" colspan="1">Primates</td>
</tr>
<tr>
<td rowspan="1" colspan="1">1252</td>
<td rowspan="1" colspan="1">Rat</td>
<td rowspan="1" colspan="1">98</td>
<td rowspan="1" colspan="1">Papillomavirus</td>
</tr>
<tr>
<td rowspan="1" colspan="1">590</td>
<td rowspan="1" colspan="1">Rodent</td>
<td rowspan="1" colspan="1">70</td>
<td rowspan="1" colspan="1">Cat</td>
</tr>
<tr>
<td rowspan="1" colspan="1">474</td>
<td rowspan="1" colspan="1">Hamster</td>
<td rowspan="1" colspan="1">70</td>
<td rowspan="1" colspan="1">Bacteria</td>
</tr>
<tr>
<td rowspan="1" colspan="1">240</td>
<td rowspan="1" colspan="1">Bovine</td>
<td rowspan="1" colspan="1">68</td>
<td rowspan="1" colspan="1">Zebrafish</td>
</tr>
<tr>
<td rowspan="1" colspan="1">214</td>
<td rowspan="1" colspan="1">Melanogaster</td>
<td rowspan="1" colspan="1">67</td>
<td rowspan="1" colspan="1">Sheep</td>
</tr>
<tr>
<td rowspan="1" colspan="1">183</td>
<td rowspan="1" colspan="1">Chicken</td>
<td rowspan="1" colspan="1">63</td>
<td rowspan="1" colspan="1">Canine</td>
</tr>
<tr>
<td rowspan="1" colspan="1">178</td>
<td rowspan="1" colspan="1">Porcine</td>
<td rowspan="1" colspan="1">63</td>
<td rowspan="1" colspan="1">Troglodytes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">135</td>
<td rowspan="1" colspan="1">Rabbit</td>
<td rowspan="1" colspan="1">61</td>
<td rowspan="1" colspan="1">Monkey</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>Note that the results from
<xref ref-type="table" rid="T1">Table 1</xref>
do not imply that 14 865 documents discuss human cases and 3664 documents discuss mouse: on the one hand, the term
<italic>human</italic>
does not necessarily occur in all abstracts on human. On the other hand, the terms
<italic>human</italic>
and
<italic>mouse</italic>
can co-occur, since some abstracts discuss patients as well as model organisms. Although the mere occurrence of terms and phrases relating to organisms does not clearly elucidate the topic of a document, this brief analysis allows us to estimate how species are distributed as subjects of documents.</p>
<p>A clear majority of all references to organisms in our test corpus is human. The second most frequent organism is mouse and is referenced four times less often in the test documents. However, it does not add noise to the cytogenetic band detection because its band-staining patterns are indicated with capital letters followed by a number. The third most frequent organism is rat, as
<italic>rat</italic>
occurs in 3.47% of the test document set. As the rat chromosome nomenclature closely follows the human cytogenetic nomenclature (14), abstracts dealing with rat band patterns are a potential source of contamination—however, they represent only a small fraction of the abstracts.</p>
<p>The problem is further reduced because of at least two reasons. First, only a fraction of these rat-related documents actually contaminate the genome-to-literature map. We manually verified a random sample of 30 documents containing the term
<italic>rat</italic>
. Only a third contained cytogenetic bands that indeed referred to the rat genome, the other documents all contained bands that referred only to the human genome. This suggests that contamination of the genome-to-literature map by nonhuman band patterns is smaller still. Second, not all bands stand the risk of contamination. Human bands at high resolution (e.g. 4q15.32) do not occur in rat. In addition, for chromosome 1 (for example) and at the same cytogenetic resolution for rat and human, only 12 of 21 rat bands and only 12 of 24 human bands occur in both nomenclatures.</p>
<p>This brief analysis shows that the contamination effect must be kept in mind, but does not weigh significantly on the results of our method.</p>
</sec>
<sec>
<title>Vocabularies</title>
<p>Geneticists, pediatricians or physicians in general, dysmorphologists, molecular cell biologists and etiologists are all interested in making genotype–phenotype correlations. They have however each a different focus—for example, a different level of emphasis on clinical practice versus molecular biology research. To retrieve knowledge that is interesting to a specific researcher at a given time, we increase the specificity of the text-mining results by limiting its scope through controlled lists of concepts derived from biomedical vocabularies and ontologies.</p>
<p>These lists or sets of linked concepts confine the results of our information extraction method to the current interest of the researcher: different domain-specific vocabularies define from which perspective to annotate the genome. The available options include dysmorphology, anatomy-specific, gene- or protein-centered, gene ontology and disease-related perspectives on the literature. An overview is shown in
<xref ref-type="table" rid="T2">Table 2</xref>
.
<table-wrap id="T2" position="float">
<label>Table 2.</label>
<caption>
<p>Different controlled vocabularies in aBandApart</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Name</th>
<th rowspan="1" colspan="1">Function</th>
<th rowspan="1" colspan="1">Example</th>
<th rowspan="1" colspan="1">Size</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">MeSH</td>
<td rowspan="1" colspan="1">Medical subject headings</td>
<td rowspan="1" colspan="1">Chemicals, medical concepts</td>
<td rowspan="1" colspan="1">16.998</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GO.B</td>
<td rowspan="1" colspan="1">Biological processes</td>
<td rowspan="1" colspan="1">‘Cell growth’, ‘signal transduction’</td>
<td rowspan="1" colspan="1">1.120</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GO.C</td>
<td rowspan="1" colspan="1">Cellular components</td>
<td rowspan="1" colspan="1">‘Proteasome’, ‘nucleus’</td>
<td rowspan="1" colspan="1">402</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GO.M</td>
<td rowspan="1" colspan="1">Molecular functions</td>
<td rowspan="1" colspan="1">‘ATPase activity’</td>
<td rowspan="1" colspan="1">701</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GO.E</td>
<td rowspan="1" colspan="1">Gene ontology</td>
<td rowspan="1" colspan="1">All of the above</td>
<td rowspan="1" colspan="1">2.170</td>
</tr>
<tr>
<td rowspan="1" colspan="1">LDDB</td>
<td rowspan="1" colspan="1">London dysmorphology database</td>
<td rowspan="1" colspan="1">‘Microcephaly’ or ‘small head’</td>
<td rowspan="1" colspan="1">808</td>
</tr>
<tr>
<td rowspan="1" colspan="1">OMIM</td>
<td rowspan="1" colspan="1">Genetic disorders</td>
<td rowspan="1" colspan="1">‘Attention deficit hyperactivity disorder’</td>
<td rowspan="1" colspan="1">1.716</td>
</tr>
<tr>
<td rowspan="1" colspan="1">CBIL</td>
<td rowspan="1" colspan="1">Human anatomy</td>
<td rowspan="1" colspan="1">‘Heart muscle’</td>
<td rowspan="1" colspan="1">303</td>
</tr>
<tr>
<td rowspan="1" colspan="1">OHDA</td>
<td rowspan="1" colspan="1">Embryo development</td>
<td rowspan="1" colspan="1">‘Early stage, fetus’</td>
<td rowspan="1" colspan="1">380</td>
</tr>
<tr>
<td rowspan="1" colspan="1">TDMS.s</td>
<td rowspan="1" colspan="1">Systems, tissues and sites</td>
<td rowspan="1" colspan="1">‘Cardiovascular system’</td>
<td rowspan="1" colspan="1">392</td>
</tr>
<tr>
<td rowspan="1" colspan="1">TDMS.l</td>
<td rowspan="1" colspan="1">Microscopic lesions</td>
<td rowspan="1" colspan="1">‘Disseminated intravascular coagulation’</td>
<td rowspan="1" colspan="1">204</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>A total of 11 vocabularies are present, shown above with an example concept and the number of concepts in each vocabulary.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>Words as well as phrases are detected as concepts. In the case of ontologies, no relational information is kept, except from synonymy, which is taken into account when applicable (e.g. with LDDB as a vocabulary, the occurrence of the phrase
<italic>small head</italic>
will trigger an association to
<italic>microcephaly</italic>
).</p>
<p>The choice of controlled vocabularies is crucial to the scope and applicability of this method. The vocabulary sources were selected with both a research and diagnostics perspective in mind. For example, options range from a rather general heritable disease vocabulary (OMIM) to a specific Dysmorphology concept hierarchy (LDDB). Also, each vocabulary focuses on a different level of biological detail, from small (molecular, biochemical) over intermediate (cellular and tissue level) to large (organs and anatomy).</p>
<p>The sources for these vocabularies were chosen based on how authoritative they are in their respective field. Several of these vocabularies have already proven their value in previous work on gene profiling and prioritization. For example, the GO-derived vocabularies boost prioritization performance in Endeavour (15), our web-based method for candidate gene prioritization by genomic data fusion. Additionally, GO, MeSH and OMIM vocabularies have proven their merit in TXTGate, a web tool in support of previous work on text-based gene and gene group profiling (16). The dysmorphology vocabulary is also widely used in its field: first, the Oxford Medical Dictionary dysmorphology and neurology databases that build on the LDDB taxonomy are a widely used clinical reference. Second, LDDB is the elementary dysmorphology taxonomy within DECIPHER, the Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources, developed and hosted at the Wellcome Trust Sanger Institute (
<monospace>
<ext-link ext-link-type="uri" xlink:href="decipher.sanger.ac.uk">decipher.sanger.ac.uk</ext-link>
</monospace>
).</p>
<p>Below, individual aBandApart vocabularies, their size, construction and origin are discussed in detail. The vocabularies themselves can be obtained from the authors upon request.</p>
<p>First, the OMIM- and LDDB-derived vocabularies provide hereditary disease and dysmorphology phenotype-specific views.</p>
<p>Second, a number of vocabularies have been constructed to mine literature at different levels of detail. From GO, vocabularies at the molecular and cellular level are constructed. TDMS offers tissues, organs and systems. Finally, two anatomical vocabularies are provided, one of which is development specific.</p>
<p>Finally, medical and chemical terms and synonyms based on MeSH make up a general purpose vocabulary.</p>
<sec>
<title>OMIM</title>
<p>This database is a catalog of human genes and genetic disorders that focuses primarily on heritable genetic diseases. Although OMIM does not contain direct information on chromosomal aberrations, it is relevant and useful as a resource of hereditary disease phenotypes. From its downloadable textual information, we have extracted a vocabulary of disease-related concepts out of which 1642 entries occur in our cytoband-related subset of the PUBMED corpus.</p>
</sec>
<sec>
<title>London Dysmorphology Database</title>
<p>Most clinical geneticists are familiar with the Oxford Medical Databases. LDDB contains information on 3428 dysmorphic syndromes and has a hierarchically structured feature vocabulary which we have manually annotated with synonymous phrases to increase recall in our method. In our band annotated corpus, 796 dysmorphologies are annotated through 1286 synonyms. This dictionary is an authoritative source (17) of information about dysmorphic and neurogenetic syndromes.</p>
</sec>
<sec>
<title>Gene Ontology</title>
<p>GO provides consistent descriptions of gene and gene-product attributes in the form of three structured controlled vocabularies that each provide a specific angle of view (biological processes, cellular components and molecular functions). The GO effort is deliberately term centered to allow for uniform queries across different databases. Our method does incorporate synonymy information from GO. GO is built and maintained with the explicit goal of applications in text mining and semantic matching in mind. Hence, the gene ontology is an ideal source for domain-specific views in our method and makes up four controlled vocabularies: (a) the whole set of GO concepts, for general associations to gene and gene-product attributes; (b) cellular components, which may include anatomical structures (e.g.
<italic>rough endoplasmic reticulum</italic>
or
<italic>nucleus</italic>
) or a gene-product group (e.g.
<italic>ribosome, proteasome</italic>
or a
<italic>protein dimer</italic>
) (18), (c) biological processes, defined as series of events accomplished by one or more ordered assemblies of molecular functions and (d) molecular functions, which describe activities at the molecular level.</p>
</sec>
<sec>
<title>TDMS tissue and lesions vocabularies</title>
<p>At another level up from the molecular and cellular scale, specific vocabularies are provided that are geared at organs, tissues and systems. Two vocabularies have been extracted from phrase lists used in a laboratory data acquisition system set up at the USA National Institutes of Health. The word lists of their toxicology data management system are subset in a vocabulary with microscopic lesions on the one hand and a vocabulary with microscopic sites, systems, tissues and organs on the other hand. This allowed us to complete the set of vocabularies ranging from the very small (molecular functions) over spatially larger concepts (cellular locations) to tissues and organs, which are part of the TDMS vocabularies. At the macroscopic end of this spectrum, CBIL offers anatomical structures.</p>
</sec>
<sec>
<title>CBIL anatomy</title>
<p>To focus on structures of larger scale than cellular and tissue levels, an anatomy-specific vocabulary was extracted from the hierarchical controlled vocabulary of anatomy terms from the computational biology and informatics laboratory at the University of Pennsylvania. The controlled vocabulary is based on anatomy terms taken from the mouse gene expression database at the Jackson Laboratory and was extended to incorporate human anatomy. It was then further revised in a number of areas, such as the haematolymphoid system and the brain.</p>
</sec>
<sec>
<title>Ontology of Human Developmental Anatomy</title>
<p>The Edinburgh Human Developmental Anatomy (19) lists the tissues present during the first 50 days after conception. This vocabulary is based on detailed anatomy information and standard named tissues for analysis of normal and abnormal human embryos. Space-associated data is included. Hunter
<italic>et al</italic>
. based this anatomical ontology on literature and on a detailed examination of histological material. It includes all the basic tissues recognizable to an experienced histologist and was designed for describing tissue at a fairly fine resolution (e.g. in gene expression experiments).</p>
</sec>
<sec>
<title>Medical Subject Headings</title>
<p>MeSH is the National Library of Medicine's controlled vocabulary thesaurus. From it, we constructed a vocabulary that takes into account all phrases up to six terms and maps all narrow and equal synonyms, leaving out broad synonyms. Apart from the 22 997 descriptors and their synonyms, over 150 000 entries and synonyms from the separate Supplementary Concept thesaurus are included as well, adding a general focus of chemical records to the vocabulary. Terms and phrases range from general to specific and constitute a general purpose vocabulary with broad coverage of the biomedical field. We recommend this vocabulary for first exploration of a genomic region and for when none of the specific vocabularies described above apply.</p>
</sec>
</sec>
<sec>
<title>Statistical overrepresentation</title>
<p>Cytogenetic bands and concepts can occur together in a single document just by chance. First, consider an abstract where one band is mentioned together with one disease and that this disease is then compared to a second disease. Merely relying on co-citation within single documents would have such an abstract cause a spurious association between the band and the second disease. Second, a similar situation occurs when a document discusses several bands and contains multiple, loosely related case reports. This situation implies that we cannot accept a genotype–phenotype association based on the mere co-occurrence of the genomic location identifier (a cytogenetic band) and a concept from one of the vocabularies. Our method reports all co-occurrences together with a
<italic>P</italic>
-value indicating how much confidence an association deserves. To quantify this level of overrepresentation, we assume a hypergeometric distribution as a model.</p>
<p>The hypergeometric distribution is a discrete probability distribution that describes the number of successes in a sequence of
<italic>C</italic>
draws from a finite population without replacement. The population has labeled (success) and unlabeled items. The hypergeometric distribution describes
<italic>P</italic>
(
<italic>O</italic>
), the probability that in a sample of
<italic>C</italic>
distinctive objects drawn from the global population, exactly
<italic>O</italic>
objects are labeled.</p>
<p>In the context of this work, the question is whether more papers link a cytoband
<italic>b</italic>
(e.g. ‘4p16.3’) to a concept
<italic>c</italic>
(e.g. ‘microcephaly’) than one might expect by chance. If this is the case, the link between the concept and the cytoband can be thought of as being overrepresented in the text corpus.</p>
<p>Let
<italic>A</italic>
be the total number of abstracts annotated to all known cytobands and concepts. This is the size of the PUBMED sub-corpus in our text indices. Let
<italic>C</italic>
be the number of papers containing concept
<italic>c</italic>
or its synonyms and
<italic>B</italic>
the number of papers associated to
<italic>b</italic>
as described in the identification of cytogenetic bands section. We want to qualify the strength of the link between band
<italic>b</italic>
and concept
<italic>c</italic>
.</p>
<p>By only inspecting abstracts from the corpus that are linked to concept
<italic>c</italic>
, in fact, a draw is performed with
<italic>C</italic>
abstracts in it where some are and some are not linked to band
<italic>b</italic>
. Let
<italic>O</italic>
<italic>
<sub>bc</sub>
</italic>
be the observed number of papers that are associated to band
<italic>b</italic>
and mention concept
<italic>c</italic>
or one of its synonyms. To know whether the number of
<italic>b</italic>
-linked papers in that drawn sample is unusually large, we need to know the probability of drawing
<italic>O</italic>
<italic>
<sub>bc</sub>
</italic>
papers or more extreme outcomes. This corresponds to calculating the cumulative probability
<italic>P</italic>
(
<italic>X</italic>
≥ 0) and can be calculated by the cumulative distribution function of a hypergeometric random variable
<italic>X</italic>
with parameters as described.</p>
<p>Since the hypergeometric distribution is a discrete probability distribution, the cumulative probability can be calculated easily by adding all corresponding single probability values. This probability constitutes a
<italic>P</italic>
-value since it is the probability of seeing something as extreme or more extreme than what was observed.</p>
<p>The
<italic>P</italic>
-value is then given by the hypergeometric cumulative distribution function
<disp-formula id="M1">
<label>1</label>
<graphic xlink:href="gkm054m1"></graphic>
</disp-formula>
<disp-formula id="M2">
<label>2</label>
<graphic xlink:href="gkm054m2"></graphic>
</disp-formula>
<disp-formula id="M3">
<label>3</label>
<graphic xlink:href="gkm054m3"></graphic>
</disp-formula>
</p>
<p>The
<italic>P</italic>
-value
<italic>p
<sub>bc</sub>
</italic>
is the probability that we observe by chance
<italic>O</italic>
<italic>
<sub>bc</sub>
</italic>
documents or more that associate band
<italic>b</italic>
to concept
<italic>c</italic>
. It is the probability of observing
<italic>O</italic>
<italic>
<sub>bc</sub>
</italic>
or more documents linked to band
<italic>b</italic>
when drawing
<italic>C</italic>
concept-related documents without replacement from a corpus of
<italic>A</italic>
abstracts. Symmetrically, it equals the probability of observing
<italic>O</italic>
<italic>
<sub>bc</sub>
</italic>
or more documents linked to concept
<italic>c</italic>
when drawing
<italic>B</italic>
band-related documents.</p>
<p>It is important to note that for small numbers of concepts and documents, the
<italic>P</italic>
-values possibly provide a distorted view on the actual relevance of the band–concept association. Even though
<italic>P</italic>
-values for small counts still correctly represent the probability of observing this or a higher number of co-citations, it is clear that the
<italic>P</italic>
-value should be regarded with caution. The web application will show actual counts with each
<italic>P</italic>
-value, to allow the user to assert confidence in the association at hand. For a detailed discussion of this issue, see the Discussion section.</p>
<sec>
<title>Relation to other distributions</title>
<p>When the population size is large compared to the sample size the hypergeometric distribution is approximated reasonably well by a binomial distribution. This approach is computationally less intensive. Both distributions were compared. Although a binomial approach proved justifiable, the hypergeometric distribution was chosen because it did not prove detrimental to performance. All
<italic>P</italic>
-values are precalculated at indexing time.</p>
<p>The hypergeometric test based on the hypergeometric distribution is identical to a one-tailed Fisher's exact test. This can be verified by writing down the 2 × 2 contingency table.</p>
</sec>
<sec>
<title>Statistically related tools</title>
<p>Our method uses different domain vocabularies as concept sources. Gene ontology as a whole, together with its three sub-branches, constitute four of our vocabularies. A range of existing tools operate on gene ontology alone to identify overrepresented concepts for (groups of) genes that result from expression array experiments. In these tools, the hypergeometric distribution and binomial approximation are prominent statistical methods. The hypergeometric approach and the equivalent Fisher's exact test constitute a standard approach in the majority of the tools, as discussed by Khatri
<italic>et al</italic>
. (20). In this review paper, they further state that although different distributions are used in different tools, it seems that in most cases the differences between the models are not dramatic.</p>
<p>Gentleman
<italic>et al</italic>
. describe the use of a hypergeometric distribution in GOStats and GOHyperG (21) to find concepts from gene ontology that are overrepresented for genes. Our work follows a similar philosophy, though applied to a series of unstructured vocabularies and literature co-citations of bands and concepts instead of genes and GO-terms.</p>
<p>FunSpec (
<monospace>
<ext-link ext-link-type="uri" xlink:href="funspec.med.utoronto.ca">funspec.med.utoronto.ca</ext-link>
</monospace>
) inputs a list of yeast gene names and outputs a summary of overrepresented concepts. The tool calculates
<italic>P</italic>
-values using the hypergeometric distribution.</p>
<p>Other tools use the hypergeometrical distribution or equivalent Fisher's exact test for finding overrepresented concepts from gene ontology, including Fatigo (22), GObar (23), GoMiner (24), GOToolBox (25), GeneMerge (26), GOTree Machine (27), OntologyTraverser (28), GOCluster (29) and GOHyperGAll in R BioConductor (30).</p>
<p>In BlastSets, Barriot
<italic>et al</italic>
. use the hypergeometric distribution to calculate the probability of having at least the observed number of elements in common between two sets of sequences for which biological relationships are inferred from different data sources.</p>
</sec>
</sec>
<sec>
<title>Web application</title>
<p>We constructed a web application to illustrate and publicize our method and to make validation efforts reproducible. The tool functions in two directions.</p>
<p>On the one hand, users indicate a cytogenetic band on a genome view. These identifiers can also be entered manually. The tool will characterize this band with statistically overrepresented vocabulary concepts found in the literature. Users indicate which controlled vocabulary is to be used, according to their current research interest. For example, when aBandApart is queried with 4p16.3 and a disease vocabulary, the most significant concepts are
<italic>achondroplasia, Wolf-Hirschhorn syndrome, Huntington disease, multiple myeloma, cherubism, dwarfism</italic>
and
<italic>hypochondroplasia</italic>
, all of which are disorders confirmed to be associated to that region.</p>
<p>On the other hand, users can start from a concept and query the database for statistically overrepresented chromosomal regions. If the concept is not found, the application will suggest alternatives with similar spelling. Overrepresented bands are listed together with their
<italic>P</italic>
-values and the raw counts that were used to calculate each
<italic>P</italic>
-value. The highly overrepresented bands are highlighted in red on the same genome chart that is used for input of cytogenetic bands. Links to relevant literature are provided with the cytoband profile.</p>
</sec>
</sec>
<sec sec-type="results">
<title>RESULTS</title>
<p>To illustrate our approach, we discuss results for searches related to heart disease. A detailed validation of our method follows as we discuss the performance on a set of 90 known gene–disease associations. We conclude by evaluating the correspondence of our results to chromosomal aberration maps composed by Brewer
<italic>et al</italic>
. (1,2).</p>
<sec>
<title>Heart disease</title>
<p>We now illustrate the approach by querying the system for
<italic>heart</italic>
while selecting CBIL, the human anatomy vocabulary. The concept
<italic>heart</italic>
has a total of 1324 documents associated to it. The five most relevant hits are shown in
<xref ref-type="table" rid="T3">Table 3</xref>
.
<table-wrap id="T3" position="float">
<label>Table 3.</label>
<caption>
<p>Five most relevant hits for query
<italic>heart</italic>
on vocabulary CBIL</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Band name</th>
<th rowspan="1" colspan="1">BC</th>
<th rowspan="1" colspan="1">B</th>
<th rowspan="1" colspan="1">
<italic>P</italic>
-value</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">22q11</td>
<td rowspan="1" colspan="1">164</td>
<td rowspan="1" colspan="1">1092</td>
<td rowspan="1" colspan="1">0</td>
</tr>
<tr>
<td rowspan="1" colspan="1">22q11.2</td>
<td rowspan="1" colspan="1">83</td>
<td rowspan="1" colspan="1">755</td>
<td rowspan="1" colspan="1">1.28e−26</td>
</tr>
<tr>
<td rowspan="1" colspan="1">20p12</td>
<td rowspan="1" colspan="1">19</td>
<td rowspan="1" colspan="1">113</td>
<td rowspan="1" colspan="1">3.03e−10</td>
</tr>
<tr>
<td rowspan="1" colspan="1">21q22.2</td>
<td rowspan="1" colspan="1">16</td>
<td rowspan="1" colspan="1">171</td>
<td rowspan="1" colspan="1">5.88e−06</td>
</tr>
<tr>
<td rowspan="1" colspan="1">7q11.23</td>
<td rowspan="1" colspan="1">20</td>
<td rowspan="1" colspan="1">301</td>
<td rowspan="1" colspan="1">1.12e−04</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The concept
<italic>heart</italic>
has a total of 1324 documents associated to it. The four columns show the hit, the number of documents that are linked to both band and concept, the number of documents linked to the band (hit) and the
<italic>P</italic>
-value.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>A very strong correlation is found for 22q11 and specifically, 22q11.2. Closer examination of these first two hits reveals that this association relates to the well-known DG/VCFS syndrome (DiGeorge/velocardiofacial syndrome). The zero
<italic>P</italic>
-value occurs because DG/VCFS, known as the 22q11.2 deletion syndrome, is the most common chromosomal deletion syndrome found in humans (32). Cardiac defects are strongly penetrant in those patients. The third best association, linking
<italic>heart</italic>
to 20p12, is corroborated by literature on the Alagille syndrome (33), a pleiotropic disorder with involvement of the liver, heart, skeleton, eyes and facial structures. The fourth, 21q22.2, is identified through literature analysis as a chromosomal region critical for heart defects related to Down syndrome (34). The fifth most relevant result is 7q11.23. When 7q11.23 is submitted as a query with the CBIL anatomy vocabulary, a link with the cardiovascular system is apparent. Results with highly significant
<italic>P</italic>
-values (
<italic>P</italic>
< 0.01) are shown in
<xref ref-type="table" rid="T4">Table 4</xref>
.
<table-wrap id="T4" position="float">
<label>Table 4.</label>
<caption>
<p>Highly significant hits (
<italic>P</italic>
-value <0.01) for query 7q11.23 on vocabulary CBIL</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Concept</th>
<th rowspan="1" colspan="1">BC</th>
<th rowspan="1" colspan="1">B</th>
<th rowspan="1" colspan="1">
<italic>P</italic>
-value</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Valve</td>
<td rowspan="1" colspan="1">5</td>
<td rowspan="1" colspan="1">51</td>
<td rowspan="1" colspan="1">8.23e−7</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Connective tissue</td>
<td rowspan="1" colspan="1">6</td>
<td rowspan="1" colspan="1">96</td>
<td rowspan="1" colspan="1">2.64e−6</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Aorta</td>
<td rowspan="1" colspan="1">5</td>
<td rowspan="1" colspan="1">70</td>
<td rowspan="1" colspan="1">5.43e−6</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Metencephalon</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">2</td>
<td rowspan="1" colspan="1">3.92e−5</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Heart</td>
<td rowspan="1" colspan="1">20</td>
<td rowspan="1" colspan="1">1324</td>
<td rowspan="1" colspan="1">1.12e−4</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Hepatocyte</td>
<td rowspan="1" colspan="1">3</td>
<td rowspan="1" colspan="1">79</td>
<td rowspan="1" colspan="1">1.58e−3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Carotid artery</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">10</td>
<td rowspan="1" colspan="1">1.71e−3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Pons</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">13</td>
<td rowspan="1" colspan="1">2.92e−3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Tonsil</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">14</td>
<td rowspan="1" colspan="1">3.40e−3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Artery</td>
<td rowspan="1" colspan="1">3</td>
<td rowspan="1" colspan="1">120</td>
<td rowspan="1" colspan="1">7.06e−3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Penis</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">22</td>
<td rowspan="1" colspan="1">8.34e−3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Cardiovascular system</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">22</td>
<td rowspan="1" colspan="1">8.34e−3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Brain</td>
<td rowspan="1" colspan="1">23</td>
<td rowspan="1" colspan="1">2267</td>
<td rowspan="1" colspan="1">9.16e−3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Skeletal muscle</td>
<td rowspan="1" colspan="1">9</td>
<td rowspan="1" colspan="1">664</td>
<td rowspan="1" colspan="1">9.78e−3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Midbrain</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">24</td>
<td rowspan="1" colspan="1">9.88e−3</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The band 7q11.23 has a total of 301 documents associated to it. The four columns show the hit, the number of documents that are linked to both band and concept, the number of documents linked to the concept (hit), and the
<italic>P</italic>
-value.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>As an illustration of how working with different domain vocabularies can be beneficial, we characterized the same 7q11.23 band through different vocabularies. From the perspective of dysmorphology, through vocabulary
<italic>LDDB</italic>
, the highest ranking concept is
<italic>supravalvular aortic stenosis</italic>
. Other cardiovascular concepts occur, together with
<italic>anxiety</italic>
and
<italic>mental retardation</italic>
, suggesting involvement of the central nervous system. The latter is confirmed through use of the disease-related vocabulary, OMIM, linking the genomic location to the Williams–Beuren syndrome. To elucidate an underlying molecular function for this anomaly, the same query was submitted with the GO molecular function vocabulary. The highest ranking concept,
<italic>elastin</italic>
, is assigned a near zero
<italic>P</italic>
-value. Indeed, the majority of Williams–Beuren syndrome (WBS) patients have been shown to have a microdeletion within 7q11.2 including the elastin gene, leading to disorganized pre-elastic and mature elastic fibers (35). Through this brief discussion we have illustrated how different domain vocabularies each provide a specific view towards a genotype–phenotype association.</p>
</sec>
<sec>
<title>NIH data set—Genes and Disease</title>
<p>The online NIH book
<italic>Genes and Disease</italic>
(
<monospace>
<ext-link ext-link-type="uri" xlink:href="www.ncbi.nlm.nih.gov/books/">www.ncbi.nlm.nih.gov/books/</ext-link>
</monospace>
), discusses a set of genes and the diseases that they are known to cause. With each genetic disorder, the underlying mutations are discussed, along with clinical features and links to key web sites. Over 80 genetic disorders have been summarized in this resource, which we use as positive controls in the validation of our method.</p>
<p>For chromosome 1, results are shown in
<xref ref-type="table" rid="T5">Table 5</xref>
. The first two columns show the gene name and disease as they occur in the NIH book. The disease name is the search term that was used to test our method. In some cases, spelling variants were used. Further columns indicate whether (
<bold>H</bold>
) the method assigned a highly significant
<italic>P</italic>
-value (
<italic>P</italic>
< 0.01) to the band to which the disease is actually associated, (
<bold>S</bold>
) whether it assigned a significant
<italic>P</italic>
-value (
<italic>P</italic>
< 0.05), (
<bold>P</bold>
) whether it delineated the band precisely; i.e. at the maximum level of karyotype resolution (4p16.1 is more precise than 4p16) and (
<bold>T</bold>
) whether it rated the band as the most significant candidate for this disease, ranking higher or as high as all other bands.
<table-wrap id="T5" position="float">
<label>Table 5.</label>
<caption>
<p>NIH book validation for chromosome 1</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Gene</th>
<th rowspan="1" colspan="1">Disease/concept</th>
<th rowspan="1" colspan="1">H</th>
<th rowspan="1" colspan="1">S</th>
<th rowspan="1" colspan="1">P</th>
<th rowspan="1" colspan="1">T</th>
<th rowspan="1" colspan="1">NIH</th>
<th rowspan="1" colspan="1">Top</th>
<th rowspan="1" colspan="1">
<italic>P</italic>
-value</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">UROD</td>
<td rowspan="1" colspan="1">Porphyria cutanea tarda</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1p34.1</td>
<td rowspan="1" colspan="1">1p34</td>
<td rowspan="1" colspan="1">0.70E−4</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GBA</td>
<td rowspan="1" colspan="1">Gaucher disease</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1q21</td>
<td rowspan="1" colspan="1">1q21</td>
<td rowspan="1" colspan="1">2.41E−22</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GLC1A</td>
<td rowspan="1" colspan="1">Glaucoma</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">1q24.3</td>
<td rowspan="1" colspan="1">1q24</td>
<td rowspan="1" colspan="1">2.21E−26</td>
</tr>
<tr>
<td rowspan="1" colspan="1">HPC1</td>
<td rowspan="1" colspan="1">Prostate cancer</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">1q25.3</td>
<td rowspan="1" colspan="1">8p22</td>
<td rowspan="1" colspan="1">0.00E−0</td>
</tr>
<tr>
<td rowspan="1" colspan="1">PS2</td>
<td rowspan="1" colspan="1">Alzheimer disease</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">0</td>
<td rowspan="1" colspan="1">1q42.13</td>
<td rowspan="1" colspan="1">1q42.1</td>
<td rowspan="1" colspan="1">0.24E−2</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>On this chromosome, five disease genes are annotated. Further columns indicate whether (
<bold>H</bold>
) the method assigned a highly significant
<italic>P</italic>
-value (<0.01) to the band to which the disease is actually associated, (
<bold>S</bold>
) whether it assigned a significant
<italic>P</italic>
-value (<0.05), (
<bold>P</bold>
) whether it delineated the band at the maximum level of karyotype resolution and (
<bold>T</bold>
) whether it rated the band as the most significant candidate for this disease, ranking higher or as high as all other bands.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>A validation of our method with the disease-related genes on other chromosomes is provided as supplementary material.</p>
<p>Our method assigns a significant
<italic>P</italic>
-value (
<italic>P</italic>
< 0.05) to 84 out of 93 (over 90%) gene-linked diseases discussed in the NIH book data set. Of these, 80 (or 86%) are assigned a highly significant
<italic>P</italic>
-value (
<italic>P</italic>
< 0.01). For 57 (or 61%) of these genetic diseases, the cytogenetic band containing the causative gene was reported with the most significant
<italic>P</italic>
-value of all reported bands. These results can be verified through the supplementary material or reproduced through the aBandApart web application.</p>
<p>Eight diseases were not significantly linked to the band containing the causative gene. Most of these misses are explained by the fact that the concept is not in any of the domain vocabularies (6 of 9 misses). This occurs with complex or overly detailed concepts (e.g.
<italic>gyrate atrophy of the choroid and retina</italic>
) or chemical compounds (e.g.
<italic>steroid 5-alpha reductase, alpha-1-antitrypsin deficiency</italic>
). Although the concept
<italic>multiple endocrine neoplasia</italic>
does not occur in any of the vocabularies, the NIH band for this disease does show an relatively high number of cancer-related concepts.</p>
<p>Second, misses can also be explained by the fact that there exists no literature in the MEDLINE corpus associating a concept or any of its synonyms to the band in question. This is the case for the CKN1 gene, where no abstracts link the Cockayne syndrome to 5q12 and for the Zellweger syndrome, where no literature links it to 12p13.3.</p>
<p>Finally, although a band is found, it is sometimes not assigned a significant
<italic>P</italic>
-value. This is the case for
<italic>diabetes</italic>
, which our method only weakly links to 7p13. Diabetes has putative causative links to many genomic regions.</p>
</sec>
<sec>
<title>Congenital malformations</title>
<p>To further validate our methodology, we evaluate its agreement with chromosome maps of autosomal deletions and duplications composed by Brewer
<italic>et al</italic>
. (1,2). In this work, clinical and cytogenetic information from the human cytogenetics database was used to associate different congenital malformations to nonmosaic single contiguous autosomal deletions and duplications. We have assembled a list of 63 malformation-to-band associations that the authors deemed statistically highly significant. Brewer
<italic>et al</italic>
. classified malformations in seven categories: cardiac, central nervous system, craniofacial, gastrointestinal, genitourinary, ocular and skeletal and limb malformations.</p>
<p>Out of 63 malformation-associated bands deemed significant by Brewer
<italic>et al</italic>
., 44 were assigned a significant
<italic>P</italic>
-value by our method (70%), 35 were given a highly significant
<italic>P</italic>
-value (56%). Five associations were detected but not given a significant
<italic>P</italic>
-value. Of the 14 associations made by Brewer
<italic>et al</italic>
. that were not detected by our method, one was missed because of different phrasing of
<italic>agenesis of corpus callosum</italic>
in literature and 13 were missed because no abstracts were found linking band and malformation. Detailed results for all 13 cardiac anomalies discussed by Brewer et al. are shown in
<xref ref-type="table" rid="T6">table 6</xref>
. The full validation is provided as supplementary material.
<table-wrap id="T6" position="float">
<label>Table 6.</label>
<caption>
<p>Congenital malformation validation</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Malformation</th>
<th rowspan="1" colspan="1">Band</th>
<th rowspan="1" colspan="1">Type</th>
<th rowspan="1" colspan="1">
<italic>P</italic>
-value <0.01</th>
<th rowspan="1" colspan="1">
<italic>P</italic>
-value <0.05</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Aortic stenosis</td>
<td rowspan="1" colspan="1">11q23-24</td>
<td rowspan="1" colspan="1">del</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Hypoplastic left heart</td>
<td rowspan="1" colspan="1">11q23-25</td>
<td rowspan="1" colspan="1">del</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Hypoplastic left heart</td>
<td rowspan="1" colspan="1">16q11-12</td>
<td rowspan="1" colspan="1">dup</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Patent ductus arteriosus</td>
<td rowspan="1" colspan="1">16q22</td>
<td rowspan="1" colspan="1">dup</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Pulmonary stenosis</td>
<td rowspan="1" colspan="1">20p13-11</td>
<td rowspan="1" colspan="1">del</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Pulmonary stenosis</td>
<td rowspan="1" colspan="1">22q11</td>
<td rowspan="1" colspan="1">del</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Pulmonary stenosis</td>
<td rowspan="1" colspan="1">8q22-24</td>
<td rowspan="1" colspan="1">dup</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Tetralogy of fallot</td>
<td rowspan="1" colspan="1">8q22-24</td>
<td rowspan="1" colspan="1">dup</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Truncus arteriosus</td>
<td rowspan="1" colspan="1">22q11</td>
<td rowspan="1" colspan="1">del</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Truncus arteriosus</td>
<td rowspan="1" colspan="1">2q22</td>
<td rowspan="1" colspan="1">del</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ventricular septal defect</td>
<td rowspan="1" colspan="1">22q11</td>
<td rowspan="1" colspan="1">del</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ventricular septal defect</td>
<td rowspan="1" colspan="1">4q31</td>
<td rowspan="1" colspan="1">del</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ventricular septal defect</td>
<td rowspan="1" colspan="1">8q24</td>
<td rowspan="1" colspan="1">dup</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>All 13 cardiac anomalies discussed by Brewer
<italic>et al</italic>
. are shown. Check marks indicate the significance with which our method associated band and concept.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>DISCUSSION</title>
<p>aBandApart links phenotype information to genomic aberrations at the level of cytogenetic bands. We assessed that significant
<italic>P</italic>
-values yielded by the method are supported by known cytogenetic aberrations and by published malformations and diseases.</p>
<p>With regard to our text-mining methodology, one point worth noting is that MEDLINE abstracts are used instead of the full text of the corresponding article. Although full text articles are increasingly made available through centralized repositories and open access initiatives, harvesting full text is not possible for all publications because of technical and legal restrictions. Although the potential difference in information present in full text must be kept in mind (for example, the surplus of sequence-related data reported in full text versus abstract was proved to be significant in an earlier study (36)), the use of abstracts alone is justified because they summarize the key information from a paper (for example, as keywords (37)).</p>
<p>Regarding the statistical methodology, it is again worth stressing that the hypergeometric approach can yield small
<italic>P</italic>
-values for associations that not necessarily deserve to be marked as meaningful. This is the case for very small numbers of concepts and documents. For example, associations of
<italic>4p16.3</italic>
to both
<italic>broad nasal tip</italic>
and
<italic>microcephaly</italic>
are flagged as significant by this method; the first based on one co-citation in 2 documents, the latter on 11 co-citations in 322 documents. Even though both resulting
<italic>P</italic>
-values correctly represent the probability of observing this or a higher number of co-citations in a statistical sense, it is clear that the
<italic>P</italic>
-value in the first situation should be regarded with more caution. One option could be to use a regularized estimator that penalizes more strongly associations involving few documents. We decided against this choice because such associations can be meaningful. In the case of association through few documents, individual abstracts must be reviewed to confirm the potential associations and avoid overreliance on the
<italic>P</italic>
-value. To allow an informed decision on the actual significance of an association between a band and a concept, the web application also indicates the actual counts that were used to calculate the
<italic>P</italic>
-value. This raw count information is crucial to the interpretation of results from the web tool:
<italic>P</italic>
-values must always be evaluated in the light of the counts mentioned in the ‘Links’ column directly to the left. The caption of the result table explicitly mentions the meaning of each field.</p>
<p>Our method for associating biomedical concepts to cytogenetic bands provides diagnostics support to clinicians looking to identify chromosomal regions containing genes involved in disease processes, and to determine clinical entities linked to genomic aberrations in patients. It supports genetic counselling and an educated followup of clinical cases. It also aids cytogeneticists to generate refined accounts on cytogenetical findings they interpret and report to medical professionals (such as gynecologists, pediatricians, psychiatrists or genetic counselors) and to the patient's family.</p>
<p>For researchers, the generation of a phenotypic genome map based on text mining will ease the identification of genes involved in disease processes and could delineate novel clinically recognizable entities. Through our controlled vocabularies, their research can be focused on specific knowledge domains. Additionally, the tool provides non-cytogeneticists an accessible bridge to the cytogenetic literature.</p>
<p>The databases can support curation of chromosomal aberration catalogs. They do not render case report catalogs obsolete, rather, they aim at complementing these resources by offering a publicly available, free, online and searchable resource that is kept up to date through regular automated updates.</p>
</sec>
<sec sec-type="supplementary-material">
<title>SUPPLEMENTARY DATA</title>
<p>Supplementary Data are available at NAR online.</p>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>[Supplementary Material]</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="nar_gkm054_index.html"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="nar_gkm054_1.pdf"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="nar_gkm054_2.pdf"></media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>ACKNOWLEDGEMENTS</title>
<p>Our research is supported by the following grants. Research Council KUL: GOA AMBioRICS, CoE EF/05/007 SymBioSys, several PhD/postdoc & fellow grants. Flemish Government: FWO: PhD/postdoc grants, projects G.0407.02 (support vector machines), G.0413.03 (inference in bioi), G.0388.03 (microarrays for clinical use), G.0229.03 (ontologies in bioi), G.0241.04 (Functional Genomics), G.0499.04 (Statistics), G.0232.05 (Cardiovascular), G.0318.05 (subfunctionalization), G.0553.06 (VitamineD), G.0302.07 (SVM/Kernel), research communities (ICCoS, ANMMM, MLDM); IWT: PhD Grants, GBOU-McKnow-E (Knowledge management algorithms), GBOU-SQUAD (quorum sensing), GBOU-ANA (biosensors), TAD-BioScope-IT, Silicos; SBO-BioFrame Belgian Federal Science Policy Office: IUAP P6/25 (BioMaGNet, Bioinformatics and Modeling: from Genomes to Networks, 2007-2011); EU-RTD: ERNSI: European Research Network on System Identification; FP6-NoE Biopattern; FP6-IP e-Tumours, FP6-MC-EST Bioptrain.</p>
<p>
<italic>Conflict of interest statement</italic>
. None declared.</p>
</ack>
<ref-list>
<title>REFERENCES</title>
<ref id="B1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brewer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Holloway</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Zawalnyski</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schinzel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>FitzPatrick</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>A chromosomal deletion map of human malformations</article-title>
<source>Am. J. Hum. Genet</source>
<year>1998</year>
<volume>63</volume>
<fpage>1153</fpage>
<lpage>1159</lpage>
<pub-id pub-id-type="pmid">9758599</pub-id>
</element-citation>
</ref>
<ref id="B2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brewer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Holloway</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Zawalnyski</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schinzel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>FitzPatrick</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>A chromosomal duplication map of malformations: regions of suspected haplo- and triplolethality—and tolerance of segmental aneuploidy—in humans</article-title>
<source>Am. J. Hum. Genet</source>
<year>1999</year>
<volume>64</volume>
<fpage>1702</fpage>
<lpage>1708</lpage>
<pub-id pub-id-type="pmid">10330358</pub-id>
</element-citation>
</ref>
<ref id="B3">
<label>3</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Schinzel</surname>
<given-names>A</given-names>
</name>
</person-group>
<source>Catalogue of Unbalanced Chromosome Aberration in Man</source>
<year>2001</year>
<publisher-loc>Berlin</publisher-loc>
<publisher-name>de Gruyter</publisher-name>
</element-citation>
</ref>
<ref id="B4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Perez-Iratxeta</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wjst</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Andrade</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>G2D: a tool for mining genes associated with disease</article-title>
<source>BMC Genet</source>
<year>2005</year>
<volume>6</volume>
<fpage>45</fpage>
<pub-id pub-id-type="pmid">16115313</pub-id>
</element-citation>
</ref>
<ref id="B5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoffmann</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Dopazo</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Cigudosa</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Valencia</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>HCAD, closing the gap between breakpoints and genes</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<fpage>511</fpage>
<lpage>513</lpage>
<pub-id pub-id-type="pmid">15661851</pub-id>
</element-citation>
</ref>
<ref id="B6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Korbel</surname>
<given-names>JO</given-names>
</name>
<name>
<surname>Doerks</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Jensen</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Perez-Iratxeta</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kaczanowski</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hooper</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Andrade</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Systematic association of genes to phenotypes by genome and literature mining</article-title>
<source>PLoS Biol</source>
<year>2005</year>
<volume>3</volume>
<fpage>e134</fpage>
<pub-id pub-id-type="pmid">15799710</pub-id>
</element-citation>
</ref>
<ref id="B7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tiffin</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Kelso</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Powell</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Bajic</surname>
<given-names>VB</given-names>
</name>
<name>
<surname>Hide</surname>
<given-names>WA</given-names>
</name>
</person-group>
<article-title>Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Evaluation Studies</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<fpage>1544</fpage>
<lpage>1552</lpage>
<pub-id pub-id-type="pmid">15767279</pub-id>
</element-citation>
</ref>
<ref id="B8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoffmann</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Valencia</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Implementing the iHOP concept for navigation of biomedical literature</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<issue>Suppl 2</issue>
<fpage>ii252</fpage>
<lpage>ii258</lpage>
<pub-id pub-id-type="pmid">16204114</pub-id>
</element-citation>
</ref>
<ref id="B9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>van Driel</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Bruggeman</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Vriend</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Brunner</surname>
<given-names>HG</given-names>
</name>
<name>
<surname>Leunissen</surname>
<given-names>JAM</given-names>
</name>
</person-group>
<article-title>A text-mining analysis of the human phenome</article-title>
<source>Eur. J. Hum. Genet</source>
<year>2006</year>
<volume>14</volume>
<fpage>535</fpage>
<lpage>542</lpage>
<pub-id pub-id-type="pmid">16493445</pub-id>
</element-citation>
</ref>
<ref id="B10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>van Driel</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Cuelenaere</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Kemmeren</surname>
<given-names>PPCW</given-names>
</name>
<name>
<surname>Leunissen</surname>
<given-names>JAM</given-names>
</name>
<name>
<surname>Brunner</surname>
<given-names>HG</given-names>
</name>
<name>
<surname>Vriend</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<issue>(Web Server issue)</issue>
<fpage>758</fpage>
<lpage>761</lpage>
</element-citation>
</ref>
<ref id="B11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Masseroli</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Galati</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Pinciroli</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<issue>(Web Server issue)</issue>
<fpage>717</fpage>
<lpage>723</lpage>
</element-citation>
</ref>
<ref id="B12">
<label>12</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Hatcher</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Gospodnetić</surname>
<given-names>O</given-names>
</name>
</person-group>
<source>Lucene in Action</source>
<year>2004</year>
<publisher-loc>Greenwich, Connecticut, USA</publisher-loc>
<publisher-name>Manning Publications Co.</publisher-name>
</element-citation>
</ref>
<ref id="B13">
<label>13</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Shaffer</surname>
<given-names>LG</given-names>
</name>
<name>
<surname>Tommerup</surname>
<given-names>N</given-names>
</name>
</person-group>
<source>ISCN 2005</source>
<year>2005</year>
<publisher-name>Karger Basel</publisher-name>
</element-citation>
</ref>
<ref id="B14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Levan</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Nomenclature on G-bands in rat chromosomes</article-title>
<source>Hereditas</source>
<year>1974</year>
<volume>77</volume>
<fpage>37</fpage>
<lpage>52</lpage>
<pub-id pub-id-type="pmid">4137863</pub-id>
</element-citation>
</ref>
<ref id="B15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aerts</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lambrechts</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Maity</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Van Loo</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Coessens</surname>
<given-names>B</given-names>
</name>
<name>
<surname>De Smet</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Tranchevent</surname>
<given-names>L.-C</given-names>
</name>
<name>
<surname>De Moor</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Marynen</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gene prioritization through genomic data fusion</article-title>
<source>Nat. Biotechnol</source>
<year>2006</year>
<volume>24</volume>
<fpage>537</fpage>
<lpage>544</lpage>
<pub-id pub-id-type="pmid">16680138</pub-id>
</element-citation>
</ref>
<ref id="B16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Glenisson</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Coessens</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Van Vooren</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mathys</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Moreau</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>De Moor</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>TXTGate: profiling gene groups with text-based information</article-title>
<source>Genome Biol</source>
<year>2004</year>
<volume>5</volume>
<fpage>R43</fpage>
<pub-id pub-id-type="pmid">15186494</pub-id>
</element-citation>
</ref>
<ref id="B17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mohnish</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Oxford Medical Databases: London Dysmorphyology Database Version 3.0</article-title>
<source>J. Med. Genet</source>
<year>2002</year>
<volume>39</volume>
<fpage>782</fpage>
<lpage>783</lpage>
</element-citation>
</ref>
<ref id="B18">
<label>18</label>
<element-citation publication-type="journal">
<collab>The Gene Ontology Consortium</collab>
<article-title>Gene Ontology; tool for the unification of biology</article-title>
<source>Nat. Genet</source>
<year>2000</year>
<volume>25</volume>
<fpage>25</fpage>
<lpage>29</lpage>
<pub-id pub-id-type="pmid">10802651</pub-id>
</element-citation>
</ref>
<ref id="B19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hunter</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kaufman</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>McKay</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Baldock</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Simmen</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Bard</surname>
<given-names>JBL</given-names>
</name>
</person-group>
<article-title>An ontology of human developmental anatomy</article-title>
<source>J. Anatomy</source>
<year>2003</year>
<volume>203</volume>
<fpage>347</fpage>
<lpage>355</lpage>
</element-citation>
</ref>
<ref id="B20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khatri</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Draghici</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Ontological analysis of gene expression data: current tools, limitations, and open problems</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>3587</fpage>
<lpage>3595</lpage>
<pub-id pub-id-type="pmid">15994189</pub-id>
</element-citation>
</ref>
<ref id="B21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Falcon</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gentleman</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Using Gostats to test gene lists for GO term association</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>23</volume>
<fpage>257</fpage>
<lpage>258</lpage>
<pub-id pub-id-type="pmid">17098774</pub-id>
</element-citation>
</ref>
<ref id="B22">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Al-Shahrour</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Diaz-Uriarte</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Dopazo</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>20</volume>
<fpage>578</fpage>
<lpage>580</lpage>
<pub-id pub-id-type="pmid">14990455</pub-id>
</element-citation>
</ref>
<ref id="B23">
<label>23</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>JSM</given-names>
</name>
<name>
<surname>Katari</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Sachidanandam</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>GObar: a gene ontology based analysis and visualization tool for gene sets</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>189</fpage>
<pub-id pub-id-type="pmid">16042800</pub-id>
</element-citation>
</ref>
<ref id="B24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zeeberg</surname>
<given-names>BR</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Fojo</surname>
<given-names>AT</given-names>
</name>
<name>
<surname>Sunshine</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Narasimhan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kane</surname>
<given-names>DW</given-names>
</name>
<name>
<surname>Reinhold</surname>
<given-names>WC</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gominer: a resource for biological interpretation of genomic and proteomic data</article-title>
<source>Geome. Biol</source>
<year>2003</year>
<volume>4</volume>
<fpage>R28</fpage>
</element-citation>
</ref>
<ref id="B25">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martin</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Brun</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Remy</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Mouren</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Thieffry</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Jacq</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>GOToolBox: functional analysis of gene datasets based Gene Ontology</article-title>
<source>Genome Biol</source>
<year>2004</year>
<volume>5</volume>
<fpage>R101</fpage>
<pub-id pub-id-type="pmid">15575967</pub-id>
</element-citation>
</ref>
<ref id="B26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Castillo-Davis</surname>
<given-names>CI</given-names>
</name>
<name>
<surname>Hartl</surname>
<given-names>DL</given-names>
</name>
</person-group>
<article-title>GeneMerge-post-genomic analysis, data mining, and hypothesis testing</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<fpage>891</fpage>
<lpage>892</lpage>
<pub-id pub-id-type="pmid">12724301</pub-id>
</element-citation>
</ref>
<ref id="B27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Schmoyer</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kirov</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Snoddy</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies</article-title>
<source>BMC Bioinformatics</source>
<year>2004</year>
<volume>5</volume>
<fpage>16</fpage>
<pub-id pub-id-type="pmid">14975175</pub-id>
</element-citation>
</ref>
<ref id="B28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Young</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Whitehouse</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Cho</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Shaw</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Ontology-traverser: an R package for GO analysis</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>275</fpage>
<lpage>276</lpage>
<pub-id pub-id-type="pmid">15333457</pub-id>
</element-citation>
</ref>
<ref id="B29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wrobel</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chalmel</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Primig</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>GoCluster integrates statistical analysis and functional interpretion of microarrary expression data. Evaluation Studies</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>3575</fpage>
<lpage>3577</lpage>
<pub-id pub-id-type="pmid">16020468</pub-id>
</element-citation>
</ref>
<ref id="B30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Doerge</surname>
<given-names>RW</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Gentleman</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Carey</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Huber</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Irizarry</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Dudoit</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Bioinformatics and computational biology solutions using R and bioconductor</article-title>
<source>Biometrics</source>
<year>2006</year>
<volume>62</volume>
<fpage>1270</fpage>
<lpage>1271</lpage>
</element-citation>
</ref>
<ref id="B31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barriot</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Poix</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Groppi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Goffard</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Sherman</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Dutour</surname>
<given-names>I</given-names>
</name>
<name>
<surname>de Daruvar</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>New strategy for the representation and the integration of biomolecular knowledge at a cellular scale</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<fpage>3581</fpage>
<lpage>3589</lpage>
<pub-id pub-id-type="pmid">15240831</pub-id>
</element-citation>
</ref>
<ref id="B32">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yakut</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kilic</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Cil</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Yapici</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Egeli</surname>
<given-names>U</given-names>
</name>
</person-group>
<article-title>FISH investigation of 22q11.2 deletion in patients with immunodeficiency and/or cardiac abnormalities</article-title>
<source>Pediatr Surg. Int</source>
<year>2006</year>
<volume>22</volume>
<fpage>1</fpage>
<lpage>4</lpage>
</element-citation>
</ref>
<ref id="B33">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krantz</surname>
<given-names>ID</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Colliton</surname>
<given-names>RP</given-names>
</name>
<name>
<surname>Tinkel</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Zackai</surname>
<given-names>EH</given-names>
</name>
<name>
<surname>Piccoli</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Goldmuntz</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Spinner</surname>
<given-names>NB</given-names>
</name>
</person-group>
<article-title>Jagged1 mutations in patients ascertained with isolated congenital heart defects</article-title>
<source>Am. J. Med. Genet</source>
<year>1999</year>
<volume>84</volume>
<fpage>56</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="pmid">10213047</pub-id>
</element-citation>
</ref>
<ref id="B34">
<label>34</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kosaki</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Kosaki</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Matsushima</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Mitsui</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Matsumoto</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Ohashi</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Refining chromosomal region critical for Down syndrome-related heart defects with a case of cryptic 21q22.2 duplication</article-title>
<source>Congenit. Anom. (Kyoto)</source>
<year>2005</year>
<volume>45</volume>
<fpage>62</fpage>
<lpage>64</lpage>
<pub-id pub-id-type="pmid">15904434</pub-id>
</element-citation>
</ref>
<ref id="B35">
<label>35</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Robinson</surname>
<given-names>WP</given-names>
</name>
<name>
<surname>Waslynka</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bernasconi</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kotzot</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Schinzel</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Delineation of 7q11.2 deletions associated with Williams-Beuren syndrome and mapping of a repetitive sequence to within and to either side of the common deletion</article-title>
<source>Genomics</source>
<year>1996</year>
<volume>34</volume>
<fpage>17</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="pmid">8661020</pub-id>
</element-citation>
</ref>
<ref id="B36">
<label>36</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wren</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Hildebrand</surname>
<given-names>WH</given-names>
</name>
<name>
<surname>Chandrasekaran</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ulrich Melcher</surname>
<given-names>U</given-names>
</name>
</person-group>
<article-title>Markov model recognition and classification of DNA/protein sequences within large text databases</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>4046</fpage>
<lpage>4053</lpage>
<pub-id pub-id-type="pmid">16159926</pub-id>
</element-citation>
</ref>
<ref id="B37">
<label>37</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shah</surname>
<given-names>PK</given-names>
</name>
<name>
<surname>Perez-Iratxeta</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Andrade</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>Information extraction from full text scientific articles: where are the keywords? Evaluation Studies</article-title>
<source>BMC Bioinformatics</source>
<year>2003</year>
<volume>4</volume>
<fpage>20</fpage>
<pub-id pub-id-type="pmid">12775220</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Belgique/explor/OpenAccessBelV2/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000400 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000400 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Belgique
   |area=    OpenAccessBelV2
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:1885641
   |texte=   Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:17403693" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a OpenAccessBelV2 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Dec 1 00:43:49 2016. Site generation: Wed Mar 6 14:51:30 2024