Serveur d'exploration sur la télématique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Towards linked open gene mutations data

Identifieur interne : 000420 ( Pmc/Corpus ); précédent : 000419; suivant : 000421

Towards linked open gene mutations data

Auteurs : Achille Zappa ; Andrea Splendiani ; Paolo Romano

Source :

RBID : PMC:3303732

Abstract

Background

With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework.

In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data.

Methods

A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest.

Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite.

Results

We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application.

Conclusions

This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development.

The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine.


Url:
DOI: 10.1186/1471-2105-13-S4-S7
PubMed: 22536974
PubMed Central: 3303732

Links to Exploration step

PMC:3303732

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Towards linked open gene mutations data</title>
<author>
<name sortKey="Zappa, Achille" sort="Zappa, Achille" uniqKey="Zappa A" first="Achille" last="Zappa">Achille Zappa</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics, IRCCS AOU San Martino-IST National Cancer Research Institute, Genoa, I-16132, Italy</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Department of Informatics, Systems and Telematics, University of Genoa, Genoa, I-16145, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Splendiani, Andrea" sort="Splendiani, Andrea" uniqKey="Splendiani A" first="Andrea" last="Splendiani">Andrea Splendiani</name>
<affiliation>
<nlm:aff id="I3">Rothamsted Research, West Common, Harpenden, Hertfordshire, AL5 2JQ, UK</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I4">Digital Enterprise Research Institute, National University of Ireland at Galway, IDA Business Park, Lower Dangan, Galway, Ireland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Romano, Paolo" sort="Romano, Paolo" uniqKey="Romano P" first="Paolo" last="Romano">Paolo Romano</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics, IRCCS AOU San Martino-IST National Cancer Research Institute, Genoa, I-16132, Italy</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22536974</idno>
<idno type="pmc">3303732</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3303732</idno>
<idno type="RBID">PMC:3303732</idno>
<idno type="doi">10.1186/1471-2105-13-S4-S7</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000420</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000420</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Towards linked open gene mutations data</title>
<author>
<name sortKey="Zappa, Achille" sort="Zappa, Achille" uniqKey="Zappa A" first="Achille" last="Zappa">Achille Zappa</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics, IRCCS AOU San Martino-IST National Cancer Research Institute, Genoa, I-16132, Italy</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Department of Informatics, Systems and Telematics, University of Genoa, Genoa, I-16145, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Splendiani, Andrea" sort="Splendiani, Andrea" uniqKey="Splendiani A" first="Andrea" last="Splendiani">Andrea Splendiani</name>
<affiliation>
<nlm:aff id="I3">Rothamsted Research, West Common, Harpenden, Hertfordshire, AL5 2JQ, UK</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I4">Digital Enterprise Research Institute, National University of Ireland at Galway, IDA Business Park, Lower Dangan, Galway, Ireland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Romano, Paolo" sort="Romano, Paolo" uniqKey="Romano P" first="Paolo" last="Romano">Paolo Romano</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics, IRCCS AOU San Martino-IST National Cancer Research Institute, Genoa, I-16132, Italy</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework.</p>
<p>In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data.</p>
</sec>
<sec>
<title>Methods</title>
<p>A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest.</p>
<p>Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite.</p>
</sec>
<sec>
<title>Results</title>
<p>We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development.</p>
<p>The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Berners Lee, T" uniqKey="Berners Lee T">T Berners-Lee</name>
</author>
<author>
<name sortKey="Hendler, J" uniqKey="Hendler J">J Hendler</name>
</author>
<author>
<name sortKey="Lassila, O" uniqKey="Lassila O">O Lassila</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stephens, S" uniqKey="Stephens S">S Stephens</name>
</author>
<author>
<name sortKey="Lavigna, D" uniqKey="Lavigna D">D LaVigna</name>
</author>
<author>
<name sortKey="Dilascio, M" uniqKey="Dilascio M">M DiLascio</name>
</author>
<author>
<name sortKey="Luciano, J" uniqKey="Luciano J">J Luciano</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dhanapalan, L" uniqKey="Dhanapalan L">L Dhanapalan</name>
</author>
<author>
<name sortKey="Chen, Jy" uniqKey="Chen J">JY Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ruttenberg, A" uniqKey="Ruttenberg A">A Ruttenberg</name>
</author>
<author>
<name sortKey="Clark, T" uniqKey="Clark T">T Clark</name>
</author>
<author>
<name sortKey="Bug, W" uniqKey="Bug W">W Bug</name>
</author>
<author>
<name sortKey="Samwald, M" uniqKey="Samwald M">M Samwald</name>
</author>
<author>
<name sortKey="Bodenreider, O" uniqKey="Bodenreider O">O Bodenreider</name>
</author>
<author>
<name sortKey="Chen, H" uniqKey="Chen H">H Chen</name>
</author>
<author>
<name sortKey="Doherty, D" uniqKey="Doherty D">D Doherty</name>
</author>
<author>
<name sortKey="Forsberg, K" uniqKey="Forsberg K">K Forsberg</name>
</author>
<author>
<name sortKey="Gao, Y" uniqKey="Gao Y">Y Gao</name>
</author>
<author>
<name sortKey="Kashyap, V" uniqKey="Kashyap V">V Kashyap</name>
</author>
<author>
<name sortKey="Kinoshita, J" uniqKey="Kinoshita J">J Kinoshita</name>
</author>
<author>
<name sortKey="Luciano, J" uniqKey="Luciano J">J Luciano</name>
</author>
<author>
<name sortKey="Marshall, Ms" uniqKey="Marshall M">MS Marshall</name>
</author>
<author>
<name sortKey="Ogbuji, C" uniqKey="Ogbuji C">C Ogbuji</name>
</author>
<author>
<name sortKey="Rees, J" uniqKey="Rees J">J Rees</name>
</author>
<author>
<name sortKey="Stephens, S" uniqKey="Stephens S">S Stephens</name>
</author>
<author>
<name sortKey="Wong, Gt" uniqKey="Wong G">GT Wong</name>
</author>
<author>
<name sortKey="Wu, E" uniqKey="Wu E">E Wu</name>
</author>
<author>
<name sortKey="Zaccagnini, D" uniqKey="Zaccagnini D">D Zaccagnini</name>
</author>
<author>
<name sortKey="Hongsermeier, T" uniqKey="Hongsermeier T">T Hongsermeier</name>
</author>
<author>
<name sortKey="Neumann, E" uniqKey="Neumann E">E Neumann</name>
</author>
<author>
<name sortKey="Herman, I" uniqKey="Herman I">I Herman</name>
</author>
<author>
<name sortKey="Cheung, Kh" uniqKey="Cheung K">KH Cheung</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deus, Hf" uniqKey="Deus H">HF Deus</name>
</author>
<author>
<name sortKey="Stanislaus, R" uniqKey="Stanislaus R">R Stanislaus</name>
</author>
<author>
<name sortKey="Veiga, Df" uniqKey="Veiga D">DF Veiga</name>
</author>
<author>
<name sortKey="Behrens, C" uniqKey="Behrens C">C Behrens</name>
</author>
<author>
<name sortKey="Wistuba, Ii" uniqKey="Wistuba I">II Wistuba</name>
</author>
<author>
<name sortKey="Minna, Jd" uniqKey="Minna J">JD Minna</name>
</author>
<author>
<name sortKey="Garner, Hr" uniqKey="Garner H">HR Garner</name>
</author>
<author>
<name sortKey="Swisher, Sg" uniqKey="Swisher S">SG Swisher</name>
</author>
<author>
<name sortKey="Roth, Ja" uniqKey="Roth J">JA Roth</name>
</author>
<author>
<name sortKey="Correa, Am" uniqKey="Correa A">AM Correa</name>
</author>
<author>
<name sortKey="Broom, B" uniqKey="Broom B">B Broom</name>
</author>
<author>
<name sortKey="Coombes, K" uniqKey="Coombes K">K Coombes</name>
</author>
<author>
<name sortKey="Chang, A" uniqKey="Chang A">A Chang</name>
</author>
<author>
<name sortKey="Vogel, Lh" uniqKey="Vogel L">LH Vogel</name>
</author>
<author>
<name sortKey="Almeida, Js" uniqKey="Almeida J">JS Almeida</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miles, A" uniqKey="Miles A">A Miles</name>
</author>
<author>
<name sortKey="Zhao, J" uniqKey="Zhao J">J Zhao</name>
</author>
<author>
<name sortKey="Klyne, G" uniqKey="Klyne G">G Klyne</name>
</author>
<author>
<name sortKey="White Cooper, H" uniqKey="White Cooper H">H White-Cooper</name>
</author>
<author>
<name sortKey="Shotton, D" uniqKey="Shotton D">D Shotton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bizer, C" uniqKey="Bizer C">C Bizer</name>
</author>
<author>
<name sortKey="Heath, T" uniqKey="Heath T">T Heath</name>
</author>
<author>
<name sortKey="Berners Lee, T" uniqKey="Berners Lee T">T Berners-Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Belleau, F" uniqKey="Belleau F">F Belleau</name>
</author>
<author>
<name sortKey="Nolin, M A" uniqKey="Nolin M">M-A Nolin</name>
</author>
<author>
<name sortKey="Tourigny, N" uniqKey="Tourigny N">N Tourigny</name>
</author>
<author>
<name sortKey="Rigault, P" uniqKey="Rigault P">P Rigault</name>
</author>
<author>
<name sortKey="Morissette, J" uniqKey="Morissette J">J Morissette</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fernald, Gh" uniqKey="Fernald G">GH Fernald</name>
</author>
<author>
<name sortKey="Capriotti, E" uniqKey="Capriotti E">E Capriotti</name>
</author>
<author>
<name sortKey="Daneshjou, R" uniqKey="Daneshjou R">R Daneshjou</name>
</author>
<author>
<name sortKey="Karczewski, Kj" uniqKey="Karczewski K">KJ Karczewski</name>
</author>
<author>
<name sortKey="Altman, Rb" uniqKey="Altman R">RB Altman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cooper, Dn" uniqKey="Cooper D">DN Cooper</name>
</author>
<author>
<name sortKey="Chen, J M" uniqKey="Chen J">J-M Chen</name>
</author>
<author>
<name sortKey="Ball, Ev" uniqKey="Ball E">EV Ball</name>
</author>
<author>
<name sortKey="Howells, K" uniqKey="Howells K">K Howells</name>
</author>
<author>
<name sortKey="Mort, M" uniqKey="Mort M">M Mort</name>
</author>
<author>
<name sortKey="Phillips, Ad" uniqKey="Phillips A">AD Phillips</name>
</author>
<author>
<name sortKey="Chuzhanova, N" uniqKey="Chuzhanova N">N Chuzhanova</name>
</author>
<author>
<name sortKey="Krawczak, M" uniqKey="Krawczak M">M Krawczak</name>
</author>
<author>
<name sortKey="Kehrer Sawatzki, H" uniqKey="Kehrer Sawatzki H">H Kehrer-Sawatzki</name>
</author>
<author>
<name sortKey="Stenson, Pd" uniqKey="Stenson P">PD Stenson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Laurila, Jb" uniqKey="Laurila J">JB Laurila</name>
</author>
<author>
<name sortKey="Naderi, N" uniqKey="Naderi N">N Naderi</name>
</author>
<author>
<name sortKey="Witte, R" uniqKey="Witte R">R Witte</name>
</author>
<author>
<name sortKey="Riazanov, A" uniqKey="Riazanov A">A Riazanov</name>
</author>
<author>
<name sortKey="Kouznetsov, A" uniqKey="Kouznetsov A">A Kouznetsov</name>
</author>
<author>
<name sortKey="Baker, Cjo" uniqKey="Baker C">CJO Baker</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fokkema, If" uniqKey="Fokkema I">IF Fokkema</name>
</author>
<author>
<name sortKey="Taschner, Pe" uniqKey="Taschner P">PE Taschner</name>
</author>
<author>
<name sortKey="Schaafsma, Gc" uniqKey="Schaafsma G">GC Schaafsma</name>
</author>
<author>
<name sortKey="Celli, J" uniqKey="Celli J">J Celli</name>
</author>
<author>
<name sortKey="Laros, Jf" uniqKey="Laros J">JF Laros</name>
</author>
<author>
<name sortKey="Den Dunnen, Jt" uniqKey="Den Dunnen J">JT den Dunnen</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Den Dunnen, Jt" uniqKey="Den Dunnen J">JT den Dunnen</name>
</author>
<author>
<name sortKey="Sijmons, Rh" uniqKey="Sijmons R">RH Sijmons</name>
</author>
<author>
<name sortKey="Andersen, Ps" uniqKey="Andersen P">PS Andersen</name>
</author>
<author>
<name sortKey="Vihinen, M" uniqKey="Vihinen M">M Vihinen</name>
</author>
<author>
<name sortKey="Beckmann, Js" uniqKey="Beckmann J">JS Beckmann</name>
</author>
<author>
<name sortKey="Rossetti, S" uniqKey="Rossetti S">S Rossetti</name>
</author>
<author>
<name sortKey="Talbot, Cc" uniqKey="Talbot C">CC Talbot</name>
</author>
<author>
<name sortKey="Hardison, Rc" uniqKey="Hardison R">RC Hardison</name>
</author>
<author>
<name sortKey="Povey, S" uniqKey="Povey S">S Povey</name>
</author>
<author>
<name sortKey="Cotton, Rg" uniqKey="Cotton R">RG Cotton</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilkinson, Md" uniqKey="Wilkinson M">MD Wilkinson</name>
</author>
<author>
<name sortKey="Mccarthy, L" uniqKey="Mccarthy L">L McCarthy</name>
</author>
<author>
<name sortKey="Vandervalk, B" uniqKey="Vandervalk B">B Vandervalk</name>
</author>
<author>
<name sortKey="Withers, D" uniqKey="Withers D">D Withers</name>
</author>
<author>
<name sortKey="Kawas, E" uniqKey="Kawas E">E Kawas</name>
</author>
<author>
<name sortKey="Samadian, S" uniqKey="Samadian S">S Samadian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Riazanov, A" uniqKey="Riazanov A">A Riazanov</name>
</author>
<author>
<name sortKey="Laurila, Jb" uniqKey="Laurila J">JB Laurila</name>
</author>
<author>
<name sortKey="Baker, Cjo" uniqKey="Baker C">CJO Baker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bada, M" uniqKey="Bada M">M Bada</name>
</author>
<author>
<name sortKey="Eilbeck, K" uniqKey="Eilbeck K">K Eilbeck</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Petitjean, A" uniqKey="Petitjean A">A Petitjean</name>
</author>
<author>
<name sortKey="Mathe, E" uniqKey="Mathe E">E Mathe</name>
</author>
<author>
<name sortKey="Kato, S" uniqKey="Kato S">S Kato</name>
</author>
<author>
<name sortKey="Ishioka, C" uniqKey="Ishioka C">C Ishioka</name>
</author>
<author>
<name sortKey="Tavtigian, Sv" uniqKey="Tavtigian S">SV Tavtigian</name>
</author>
<author>
<name sortKey="Hainaut, P" uniqKey="Hainaut P">P Hainaut</name>
</author>
<author>
<name sortKey="Olivier, M" uniqKey="Olivier M">M Olivier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marra, D" uniqKey="Marra D">D Marra</name>
</author>
<author>
<name sortKey="Romano, P" uniqKey="Romano P">P Romano</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sioutos, N" uniqKey="Sioutos N">N Sioutos</name>
</author>
<author>
<name sortKey="De Coronado, S" uniqKey="De Coronado S">S de Coronado</name>
</author>
<author>
<name sortKey="Haber, Mw" uniqKey="Haber M">MW Haber</name>
</author>
<author>
<name sortKey="Hartel, Fw" uniqKey="Hartel F">FW Hartel</name>
</author>
<author>
<name sortKey="Shaiud, W L" uniqKey="Shaiud W">W-L Shaiud</name>
</author>
<author>
<name sortKey="Wright, Lw" uniqKey="Wright L">LW Wright</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goh, K I" uniqKey="Goh K">K-I Goh</name>
</author>
<author>
<name sortKey="Cusick, Me" uniqKey="Cusick M">ME Cusick</name>
</author>
<author>
<name sortKey="Valle, D" uniqKey="Valle D">D Valle</name>
</author>
<author>
<name sortKey="Childs, B" uniqKey="Childs B">B Childs</name>
</author>
<author>
<name sortKey="Vidal, M" uniqKey="Vidal M">M Vidal</name>
</author>
<author>
<name sortKey="Barabasi, A L" uniqKey="Barabasi A">A-L Barabási</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heath, T" uniqKey="Heath T">T Heath</name>
</author>
<author>
<name sortKey="Bizer, C" uniqKey="Bizer C">C Bizer</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heim, P" uniqKey="Heim P">P Heim</name>
</author>
<author>
<name sortKey="Hellmann, S" uniqKey="Hellmann S">S Hellmann</name>
</author>
<author>
<name sortKey="Lehmann, J" uniqKey="Lehmann J">J Lehmann</name>
</author>
<author>
<name sortKey="Lohmann, S" uniqKey="Lohmann S">S Lohmann</name>
</author>
<author>
<name sortKey="Stegemann, T" uniqKey="Stegemann T">T Stegemann</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vidal, M" uniqKey="Vidal M">M Vidal</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22536974</article-id>
<article-id pub-id-type="pmc">3303732</article-id>
<article-id pub-id-type="publisher-id">1471-2105-13-S4-S7</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-13-S4-S7</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Towards linked open gene mutations data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" id="A1">
<name>
<surname>Zappa</surname>
<given-names>Achille</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>achille.zappa@istge.it</email>
</contrib>
<contrib contrib-type="author" id="A2">
<name>
<surname>Splendiani</surname>
<given-names>Andrea</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<xref ref-type="aff" rid="I4">4</xref>
<email>andrea.splendiani@rothamsted.ac.uk</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A3">
<name>
<surname>Romano</surname>
<given-names>Paolo</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>paolo.romano@istge.it</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Bioinformatics, IRCCS AOU San Martino-IST National Cancer Research Institute, Genoa, I-16132, Italy</aff>
<aff id="I2">
<label>2</label>
Department of Informatics, Systems and Telematics, University of Genoa, Genoa, I-16145, Italy</aff>
<aff id="I3">
<label>3</label>
Rothamsted Research, West Common, Harpenden, Hertfordshire, AL5 2JQ, UK</aff>
<aff id="I4">
<label>4</label>
Digital Enterprise Research Institute, National University of Ireland at Galway, IDA Business Park, Lower Dangan, Galway, Ireland</aff>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>28</day>
<month>3</month>
<year>2012</year>
</pub-date>
<volume>13</volume>
<issue>Suppl 4</issue>
<supplement>
<named-content content-type="supplement-title">Italian Society of Bioinformatics (BITS): Annual Meeting 2011</named-content>
<named-content content-type="supplement-editor">Paolo Romano and Manuela Helmer-Citterich</named-content>
</supplement>
<fpage>S7</fpage>
<lpage>S7</lpage>
<permissions>
<copyright-statement>Copyright ©2012 Zappa et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2012</copyright-year>
<copyright-holder>Zappa et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S4/S7"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework.</p>
<p>In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data.</p>
</sec>
<sec>
<title>Methods</title>
<p>A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest.</p>
<p>Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite.</p>
</sec>
<sec>
<title>Results</title>
<p>We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development.</p>
<p>The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine.</p>
</sec>
</abstract>
<conference>
<conf-date>20-22 June 2011</conf-date>
<conf-name>Eighth Annual Meeting of the Italian Society of Bioinformatics (BITS)</conf-name>
<conf-loc>Pisa, Italy</conf-loc>
</conference>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<sec>
<title>The conversion of relational database contents into the Semantic Web</title>
<p>The Semantic Web [
<xref ref-type="bibr" rid="B1">1</xref>
] is gaining momentum as a framework for the development of next-generation bioinformatics data integration tools since its standards and technologies seem to have now reached enough maturity to be considered a viable solution for data integration challenges. Semantic Web based approaches to biomedical data integration have already been proposed a number of times in recent years [
<xref ref-type="bibr" rid="B2">2</xref>
-
<xref ref-type="bibr" rid="B6">6</xref>
]. Most recently an approach known as Linked Data, or Web of Data is being explored.</p>
<p>The vision of the Semantic Web is to evolve the Web into a distributed knowledge base: this vision relies on its evolution from the current Web of Documents, where each node of the network is represented by an unstructured document, into a Web of Data, where each node represents machine processable information. In this context, access to information is achieved through portals and search engines whose behavior is supported by semantic features. A good introduction to the Web of Data can be found in [
<xref ref-type="bibr" rid="B7">7</xref>
].</p>
<p>A relevant contribution to this evolution of the Web may come from the conversion of data stored in Relational Databases (RDB) into a viable representation such as the Resource Description Framework (RDF) [
<xref ref-type="bibr" rid="B8">8</xref>
], which is the basic technology to represent information in the Semantic Web. RDF is based on the composition of simple predicates ("triples") made by three elements identifying "Subject", "Predicate" (or "Property"), and "Object". Here, semantics can easily be associated to property definitions, while subjects usually are well identified entities and objects may either represent related entities or values.</p>
<p>Many research works have therefore been focused either on the static conversion or on the dynamic mapping of data from RDB to RDF. They have led to the implementation of both mapping tools and domain-specific applications. Some mappings are automatically generated via a simple association where the name of the relational table is mapped to an RDF class node and the names of its columns are used as RDF predicates. As a consequence, cell values are mapped to instances or data values. In this case, entities and relations, as well as their meaning, reflect the RDB schema and the knowledge of the schema is needed to understand the exported information.</p>
<p>In other mappings, relations and entities of the original databases are converted to a representation which is instead based on a shared conceptualization that can be, even significantly, different from the schema of the database. Differences may relate to properties, relationships, and even entity values (e.g., different coding applied, split/merged values). In this case, automatic mappings can serve as a starting point to quickly create customized, domain-specific mappings.</p>
<p>Relational to RDF mapping software exists both as independent tools (e.g.: D2RQ and Triplify), or as part of a larger suite (e.g. Allegrograph, Sesame, OWLim, Virtuoso). In general, they are components of a wider range of software solutions which can expose RDF entities and relations in structured information resources. A list of these tools is available on-line [
<xref ref-type="bibr" rid="B9">9</xref>
].</p>
<p>In the biomedical domain, an exemplar resource is represented by Bio2RDF [
<xref ref-type="bibr" rid="B10">10</xref>
], a system that allows an integrated access to a vast number of biomedical databases through Semantic Web technologies, i.e. RDF for data representation and SPARQL (SPARQL Protocol and RDF Query Language) [
<xref ref-type="bibr" rid="B11">11</xref>
] for queries. To this aim, many databases have been converted to RDF by special scripts, called RDFizers, while some information systems that were already offering a viable format and interface where directly linked to the system.</p>
<p>This conversion was based on a unified ontology, taking care of properties included in the information resources already available in RDF. Moreover, the system provided a unified URI schema, overcoming heterogeneity of URIs already provided by other systems. All major genomics, proteomics, networks and pathways, and nomenclatures databases were included in the system, as well as some clinical, e.g. Online Mendelian Inheritance in Man (OMIM), and bibliographic ones, e.g. PubMed, and the Gene Ontology.</p>
<p>The Linked Open Data (LOD) initiative, a Community Project at World Wide Web Consortium (W3C), aims at extending "the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources" [
<xref ref-type="bibr" rid="B12">12</xref>
]. In this context, many biomedical databases have already been made available (a Linked Open Data cloud diagram is available on-line [
<xref ref-type="bibr" rid="B13">13</xref>
]). Many of these datasets derive from Bio2RDF, but there are also some that were independently built, e.g. Diseasome, a dataset extracted from OMIM that includes information on disorders and disease-related genes linked by known associations.</p>
</sec>
<sec>
<title>Human variation data and the Semantic Web</title>
<p>In the last decade, with the advent of high-throughput technologies, sequencing has become faster and less expensive. As a consequence, a great wealth of data is being produced in order to identify variation data, i.e. specific, individual and sub-population related information. One of the best known projects of this kind is "1,000 Genomes", an international collaboration that recently ended its pilot phase [
<xref ref-type="bibr" rid="B14">14</xref>
,
<xref ref-type="bibr" rid="B15">15</xref>
]. The goal of the pilot phase was the identification of at least the 95% of variations present in at least 1% of individuals in three distinct populations by means of Next-Generation Sequencing technologies. This led to the production of ca. 4.9 Tbases (about 3 Gbases/individual) and to the determination of 15 millions mutations, 1 million deletions/insertions, and 20,000 variants of greater size.</p>
<p>Such information constitutes the basis on which genomics may meet clinical information, correlation analyses between genotypes and phenotypes may be carried out, and the perspectives of genomic or personalized medicine may be realized [
<xref ref-type="bibr" rid="B16">16</xref>
,
<xref ref-type="bibr" rid="B17">17</xref>
].</p>
<p>Although several databases on gene mutation and variation for humans exist, their semantic annotation is very limited and their formats are heterogeneous. Overall, only a little information on human variation is included in the Web of Data and/or it is available on-line in implementations that are based on Semantic Web technologies. This is the case, e.g., for the data on impact of protein mutations on their function that was extracted from scientific literature by using a specialized text mining pipeline by Laurila et al [
<xref ref-type="bibr" rid="B18">18</xref>
] (in this case, data is available on-line through a SPARQL endpoint, but access is restricted to authorized users only).</p>
<p>Lists of Locus Specific Data Bases (LSDB) and other databases related to human variation, like those related to Disease Centered Mutations, SNPs (Single Nucleotide Polymorphisms), National and Ethnic Mutations, Mitochondrial Mutations, and Chromosomal Variation, are available on-line at the site of the Human Genome Variation Society (HGVS) [
<xref ref-type="bibr" rid="B19">19</xref>
,
<xref ref-type="bibr" rid="B20">20</xref>
], although many of these lists are not up-to-date. Indeed, the best human variation information is available in curated databases, many of which are managed by means of the Leiden Open Variation Database (LOVD) [
<xref ref-type="bibr" rid="B21">21</xref>
] schema and system. Many other databases are managed by proprietary systems. The Human Variome Project (HVP) [
<xref ref-type="bibr" rid="B22">22</xref>
] has produced recommendations for nomenclatures of variations and for contents of mutation databases.</p>
<p>The issue of integrating variation data with molecular biology databases is however well known. Conditions for the integration of LSDBs with other biological databases have been outlined by den Dunnen et al in [
<xref ref-type="bibr" rid="B23">23</xref>
]. In this paper, a distinction is made between the information that should be shared and the one that could be shared. In the former set, only some reference data are defined, including contact information for the database, identifiers of the gene in various databases, a unique reference to the sequence, and the description of the mutation at DNA level. In the latter set, that includes data on original bibliography, changes at protein and RNA levels, and associated pathogeniticy, issues related to ownerships and quality of data are also present.</p>
</sec>
<sec>
<title>Shared property definitions for human variation data</title>
<p>Integrating data on the Semantic Web is mainly a matter of shared and reusable properties' definitions and unique data identifiers. Some mutation related ontologies exist. These include the Variation Ontology (VariO) [
<xref ref-type="bibr" rid="B24">24</xref>
] and the Mutation Impact Ontology (MIO) [
<xref ref-type="bibr" rid="B25">25</xref>
].</p>
<p>VariO is still in a development phase, not officially released for annotation or analysis purposes. It is aiming at providing standardized, systematic descriptions of effects and consequences of position specific variations. It can be used to describe effects and consequences of variations at different levels (DNA, RNA protein). VariO reuses some terms and definitions from Sequence Ontology (SO), Gene Ontology (GO), and other ontologies.</p>
<p>MIO was developed to support semantic extraction and grounding of mutation impact data from literature. The ontology has a strong use case in the publication of text mining results through semantic Web Services [
<xref ref-type="bibr" rid="B18">18</xref>
] in the framework of the Semantic Automated Discovery and Integration (SADI) [
<xref ref-type="bibr" rid="B26">26</xref>
,
<xref ref-type="bibr" rid="B27">27</xref>
] infrastructure.</p>
<p>Other biological ontologies making reference to mutations also exist, such as the Sequence Ontology (for a recent assessment of the state and issues in incorporating mutation information in SO see [
<xref ref-type="bibr" rid="B28">28</xref>
]). However, a specific ontology able to represent or support representation of gene variation data is not available yet.</p>
<p>Even more relevant, a specific framework for identifying variations is missing. HGVS nomenclature defines mutations in relation to a specific version of RefSeq, which leaves the reconciliation of mutations described with reference to different RefSeq versions problematic. In a LOD framework, this is a key issue as having common URIs for the same mutations is a key for the integration of different datasets.</p>
<p>Furthermore, the definition of equivalent mutation relies on an abstraction which is based on sequence similarity. As such, it is not easily deducible by using common inference mechanisms which are based on Semantic Web technologies and tools (e.g.: a cluster of sequences may
<italic>de facto</italic>
inform a class which is characterized by the related consensus sequence).</p>
<p>Solutions which incorporate services in the LOD, e.g. SADI, may provide a unified framework where ontology languages and sequence alignment services could be used to compute the equivalence of mutations.</p>
</sec>
<sec>
<title>Aim of this work</title>
<p>In this paper, we cope with issues related to the integration of mutation data in the Linked Open Data infrastructure. We present the development of a mapping between a relational version of the IARC TP53 Mutation database (IARCDB) to RDF that takes into account HGVS recommendations as well as existing ontologies for the representation of this domain knowledge. A first implementation of servers publishing this data in RDF with the aim of studying issues related to the integration of mutation data in the Linked Open Data cloud is also presented.</p>
</sec>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>Software infrastructure</title>
<p>Linked Data is an approach to publish information on the web which eases the integration of data from different sources by relying both on RDF as a
<italic>lingua franca </italic>
for a machine processable representation of information and on shared ontologies in order to allow the information from different resources to be semantically connected to each other. RDF itself however does not provide domain-specific terms. These may be identified by adopting shared taxonomies, vocabularies, and ontologies. Suitable terms in existing ontologies should of course be reused whenever possible: new terms should only be added when a viable term does not exist. RDF properties must then be mapped to external ontologies, while resources (objects and subjects in RDF triples) must be linked to LOD repositories by using shared identifiers, and comments and definitions must be added whenever possible.</p>
<p>By using dereferenceable URIs, i.e. URIs for which it is possible to get information about the referenced resource on the Web, as global identifiers for resources, linked data makes it possible to set hyperlinks between entities in different data sources. Such links are the glue connecting data islands of the Web of Data into a global, interconnected data space.</p>
<p>In our case, automatic RDB to RDF mappings were first created by using D2RQ, a platform for treating relational databases as virtual RDF graphs [
<xref ref-type="bibr" rid="B29">29</xref>
]. This tool allows on-the-fly generation of RDF triples from a database. It also allows browsing the generated RDF triples through a standard web interface and querying the relational database through a SPARQL endpoint.</p>
<p>The query performance of D2RQ is lower than that which can be achieved by using a devoted RDF triple store. In order to evaluate reliability and performances of an on-the-fly mapping system, such as D2RQ, compared to a native RDF framework, a dump of all triples generated by D2RQ was loaded into a dedicated Jena TDB triple store [
<xref ref-type="bibr" rid="B30">30</xref>
], a SPARQL Database for Jena [
<xref ref-type="bibr" rid="B31">31</xref>
] that provides for large scale storage and querying of RDF datasets. Many triple stores exist which varies in features and performance. We have opted for Jena as it was providing sufficient performance and, at the same time, a good set of integrated and interoperable tools which was ideal for our prototype lead development approach.</p>
<p>We have then enriched our dataset by adding triples connecting samples to related UMLS concepts. This mapping was implemented by means of a SPARQL Update federated query interconnecting our dataset with the Linked Life Data (LLD) endpoint.</p>
<p>Once the RDF dataset was created, it was made accessible as a SPARQL endpoint using Joseki [
<xref ref-type="bibr" rid="B32">32</xref>
], a Jena tool that provides support for SPARQL queries through an HTTP engine. Joseki was configured to connect to the TDB database and it was connected to an implementation of SPARQLer [
<xref ref-type="bibr" rid="B31">31</xref>
], a user friendly interface to a SPARQL server. Finally, we exposed the content of our TDB triple store as a Linked Data interface using Pubby, a well known Linked Data frontend for SPARQL endpoints [
<xref ref-type="bibr" rid="B33">33</xref>
].</p>
</sec>
<sec>
<title>Datasets</title>
<p>The IARC TP53 Database has been maintained at the International Agency for Research on Cancer (IARC) in Lyon, France, since 1994 [
<xref ref-type="bibr" rid="B34">34</xref>
]. The database compiles all TP53 mutations that have been reported in the published literature since 1989 [
<xref ref-type="bibr" rid="B35">35</xref>
]. It includes annotations on functional impact of mutations, either predicted or experimentally assessed, clinico-pathologic characteristics of tumors and demographic and life-style information on patients.</p>
<p>Various datasets, corresponding to different views of the available data, are made available to interested users as spreadsheets for download. The relational schema, however, is not public. On-line queries are meant to allow for data analysis only, answering to such questions as "Search for TP53 mutation prevalence in selected types of tumor and populations", and "Display a histogram showing the distribution of tumors associated with the selected germline or somatic mutation(s)".</p>
<p>For many years, in the sphere of a collaboration between IARC and the National Cancer Research Institute of Genova, datasets have been implemented in a relational databases management system at IST as a basis for an SRS implementation of the IARC TP53 Mutation Database [
<xref ref-type="bibr" rid="B36">36</xref>
,
<xref ref-type="bibr" rid="B37">37</xref>
].</p>
</sec>
<sec>
<title>Development of mappings</title>
<p>The first mapping file was produced automatically by D2RQ. It was then manually refined to improve its commitment to a shared representation and, thus, to encode in the mapping some semantics that is not expressed in the RDB schema.</p>
<p>For instance, D2RQ generates predicate names which are based on the RDB column names: there is no way to know when a predicate refers to a property for which a shared representation exists. By customizing predicates we have been able to better represent the semantics of our data, according to shared ontologies.</p>
<p>Only a limited set of external ontologies and terminologies have been taken into account. These include the NCI Thesaurus (NCIT) [
<xref ref-type="bibr" rid="B38">38</xref>
,
<xref ref-type="bibr" rid="B39">39</xref>
] for medical terminology (namely topography and morphology), the Bibliographic Ontology (BIBO) [
<xref ref-type="bibr" rid="B40">40</xref>
] and the BibTeX definition in Web Ontology Language (OWL) [
<xref ref-type="bibr" rid="B41">41</xref>
] for bibliographic references, the Diseasome ontology [
<xref ref-type="bibr" rid="B42">42</xref>
,
<xref ref-type="bibr" rid="B43">43</xref>
] and MIO [
<xref ref-type="bibr" rid="B25">25</xref>
].</p>
<p>Moreover, external links were set to LOD implementations of DBpedia, a system including all structured information which is present in Wikipedia pages [
<xref ref-type="bibr" rid="B44">44</xref>
], PubMed, the Human Genome Nomenclature Committee (HGNC) database [
<xref ref-type="bibr" rid="B45">45</xref>
], the On-line Mendelian Inheritance in Man (OMIM) system, UniProt, and the Unified Medical Language System (UMLS). All links were defined to the Bio2RDF entry points of these databases, expressed by using the unified Bio2RDF URI style, with the exception of DBpedia, that was linked through its own namespace, and UMLS, that was connected through LinkedLifeData [
<xref ref-type="bibr" rid="B46">46</xref>
].</p>
<p>We also made reference to some other frequently used vocabularies such as rdf:, rdfs:, and owl:. Where shared relations were not available to express the content of our database, we have used ad-hoc defined properties. The re-use of ontologies is not limited to relations. The majority of "values" in the IARC database comes from controlled vocabularies and reference dictionaries and therefore they are
<italic>de facto</italic>
ontology terms. We have mapped these terms to classes from ontologies (or terminologies) such as UMLS.</p>
<p>URIs (identifiers of nodes in RDF) have been made compatible, where possible, with Bio2RDF and PubMed. On the whole, these implementation choices allow us to state that our system is deployed according to Linked Data principles [
<xref ref-type="bibr" rid="B47">47</xref>
]. Three examples of mappings are reported in table
<xref ref-type="table" rid="T1">1</xref>
. The first example shows the association of the title of a paper to an entity representing the related bibliographic reference through the bibtex:hasTitle property. The second one presents the association of the wild type aminoacid to the corresponding gene variation through the mio:hasWildTypeResidue property. The last one connects our implementation with an external entity, namely the Bio2RDF implementation of the HGNC database.</p>
<table-wrap id="T1" position="float">
<label>Table 1</label>
<caption>
<p>Examples of D2RQ mappings</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Description</th>
<th align="left">Mapping</th>
<th align="left">Generated triple (example)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Create a triple that defines the value of the title of a given bibliographic reference.</td>
<td align="left">map:somatic_ref_Title a d2rq:PropertyBridge;
<break></break>
d2rq:belongsToClassMap map:somatic_ref;
<break></break>
d2rq:property bibtex:hasTitle;
<break></break>
d2rq:column "SomaticRef15.Title";.</td>
<td align="left">
<break></break>
<break></break>
"Mutations in the p53 gene occur in diverse human tumour types"</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Create a triple that defines the value of the wild type aminoacid for a given gene variation. Conditional clauses avoid definitions for special cases (empty field, dash character and "NA" value).</td>
<td align="left">map:variation_hasWildTypeResidue a d2rq:PropertyBridge;
<break></break>
d2rq:belongsToClassMap map:variation;
<break></break>
d2rq:property mio:hasWildTypeResidue;
<break></break>
d2rq:column "mutations15.WT_AA";
<break></break>
d2rq:condition "mutations15.WT_AA ! = ('NA')";
<break></break>
d2rq:condition "mutations15.WT_AA ! = ('')";
<break></break>
d2rq:condition "mutations15.WT_AA ! = ('-')";.</td>
<td align="left">
<break></break>
"P"</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Creates a triple that establishes a link to the TP53 human gene description in HGNC as implemented in Bio2RDF.</td>
<td align="left">map:gene_HGNC a d2rq:PropertyBridge;
<break></break>
d2rq:belongsToClassMap map:gene;
<break></break>
d2rq:property diseasome:hgncId;
<break></break>
d2rq:uriPattern "http://bio2rdf.org/hgnc:11998";.</td>
<td align="left">
<break></break>
<break></break>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Three examples of D2RQ mappings between the IARC TP53 Mutation database and its RDF representation. In the first one, the title of a paper is just extracted from the database and associated to an entity representing a given bibliographic reference by means of the bibtex:hasTitle property. In the second example, the one-letter code of the wild type aminoacid corresponding to the location of a given variation is associated to the entity representing the same variation through the mio:hasWildTypeResidue property. The third example creates a connection with an external entity, namely the HGNC identifier of the TP53 gene, by specifying its Linked Data URI.</p>
</table-wrap-foot>
</table-wrap>
<p>The Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
includes the complete mapping.</p>
</sec>
</sec>
<sec>
<title>Results</title>
<sec>
<title>A prototype implementation</title>
<p>We have implemented a D2RQ Server for TP53 mutation data as a prototype for studying issues related to the publication of mutation data on the LOD framework. It provides data on a significant subset of the IARC TP53 database, including gene variations, somatic mutations, and related bibliographic references. In order to minimize duplications, information on samples and individuals have been made available separately from the related mutations.</p>
<p>In table
<xref ref-type="table" rid="T2">2</xref>
, we present a summary of published classes and the correspondence with the original datasets. In Figure
<xref ref-type="fig" rid="F1">1</xref>
, the schematic representation of classes that were created by the mapping, of their relationships, and of external links is presented.</p>
<table-wrap id="T2" position="float">
<label>Table 2</label>
<caption>
<p>Correspondence between classes and original datasets</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Class</th>
<th align="left">Description</th>
<th align="left">Linked to</th>
<th align="left">IARC datasets</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">database</td>
<td align="left">Database information</td>
<td align="left">somatic_mutation</td>
<td align="left">Implicit</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">gene</td>
<td align="left">Gene information</td>
<td align="left">gene_variation</td>
<td align="left">Implicit</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">gene_variation</td>
<td align="left">Detailed description of the mutation</td>
<td align="left">gene, somatic_mutation</td>
<td align="left">Gene variations</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">somatic_mutation</td>
<td align="left">Summary mutation data, linked to bibliography, sample, and variation data</td>
<td align="left">database, sample, gene_variation, somatic_ref</td>
<td align="left">Somatic mutations</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">sample</td>
<td align="left">Tumor topography, morphology, origin, and classification</td>
<td align="left">somatic_mutation, individual</td>
<td align="left">Somatic mutations</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">individual</td>
<td align="left">Demographic details, life-style data and genetics of the donor</td>
<td align="left">sample</td>
<td align="left">Somatic mutations</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">somatic_ref</td>
<td align="left">Bibliographic references where mutations are described</td>
<td align="left">somatic_mutation</td>
<td align="left">Somatic references</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>A summary of published classes and the correspondence with the original datasets.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>Classes, relationships and external links</bold>
. A schematic representation of classes that were created by the mapping, of their relationships, and of external links is presented in this figure. The great boxes represent classes, while the smaller represent external datasets. In the latter case, a yellow border denotes RDF dataset linked by URIs, a red one denotes web sites linked by URLs.</p>
</caption>
<graphic xlink:href="1471-2105-13-S4-S7-1"></graphic>
</fig>
<p>In Figure
<xref ref-type="fig" rid="F2">2</xref>
, the architecture of the overall system is shown. In brief, the original relational database is translated in RDF and exposed using the D2RQ framework. A RDF dump is stored into a TDB triple store that can be queried through a Joseki SPARQL endpoint either directly (through some SPARQL client) or by means of the Pubby interface that in turn may be queried by both HTML and RDF browsers.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>Architecture of the system</bold>
. The components of the system and their interfaces are shown. The triple store is populated by the RDF dump, that is created by D2RQ, and incremented by special SPAQRL updates. Access to the triple store is granted through Joseki, which can be queried by SPARQL clients. The Pubby interface allows data navigation by means of both HTML and RDF browsers.</p>
</caption>
<graphic xlink:href="1471-2105-13-S4-S7-2"></graphic>
</fig>
</sec>
<sec>
<title>Providing access to the information</title>
<p>As previously reported, the Pubby server gives access to the information through various interfaces. It allows browsing the RDF graph starting from any page: further navigation of the graph is achieved by internal links. E.g., one can select a defined somatic mutation (somatic_mutations/10000) and see links to related gene variation (variations/1579), sample (samples/9557), and bibliographic reference (somatic_refs/1065), along with some proper attributes, like the mutation (c.426T > A) or structural motif (NDBL/beta-sheets). See Figure
<xref ref-type="fig" rid="F3">3</xref>
.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>
<bold>Browsing the RDF graph: a somatic mutation</bold>
. Representation of properties and values of a defined somatic mutation (somatic_mutations/10000), including links to related gene variation (variations/1579), sample (samples/9557), and bibliographic reference (somatic_refs/1065), together with some proper attributes (mutation and structural motif). This class is central within the schema, linking the majority of classes.</p>
</caption>
<graphic xlink:href="1471-2105-13-S4-S7-3"></graphic>
</fig>
<p>Of course, this way of browsing the RDF graph allows to reach further pages. E.g., in the previous case, from a single mutation one can reach the triples associated to the related bibliographic reference and thus have a list of all mutations that were described in the same paper. In Figure
<xref ref-type="fig" rid="F4">4</xref>
all properties and objects associated to the bibliographic reference denoted by somatic_refs/1065 are shown. In this figure, links to all somatic mutations described in the paper are presented. Moreover, two links to Pubmed are shown: the first one refers to its implementation in Bio2RDF and allows to extend the navigation of the virtual RDF graph externally from our implementation, while the second one relates to the interface that is usually accessed by researchers at NCBI.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption>
<p>
<bold>Browsing the RDF graph: a bibliographic reference</bold>
. Properties and objects associated to the bibliographic reference denoted by somatic_refs/1065 are shown. All links to somatic references described in the paper are presented. Links to Pubmed referring to its Bio2RDF implementation and to the NCBI web site are also shown.</p>
</caption>
<graphic xlink:href="1471-2105-13-S4-S7-4"></graphic>
</fig>
<p>More precisely, Pubby is used to add a Linked Data interface to our SPARQL endpoint. It handles external requests by connecting to the SPARQL endpoint, issuing a SPARQL "DESCRIBE" query about the requested URI, and showing the result in a HTML or RDF page, supporting Linked Data compliant content resolution and negotiation procedures.</p>
<p>An additional interface to the SPARQL endpoint is provided via SPARQLer, a simple interface to perform SPARQL queries. In this case, all prefixes that are needed to properly identify RDF nodes are added by the system, so that the compilation of the query is simplified.</p>
<p>Moreover, the RDF data set that we implemented can be explored by using any Semantic Web browser or application, like Marbles [
<xref ref-type="bibr" rid="B48">48</xref>
]. The SPARQL endpoint can also be queried by using some more sophisticated tool, such as RelFinder [
<xref ref-type="bibr" rid="B49">49</xref>
].</p>
</sec>
<sec>
<title>Some example queries</title>
<p>With the aim to show which kind of questions can currently be posed to our implementation, we are presenting here three example queries that can be carried out through the TDB-Joseki SPARQL endpoint [
<xref ref-type="bibr" rid="B50">50</xref>
]. A summary of these queries and of related results is shown in Tables
<xref ref-type="table" rid="T3">3</xref>
,
<xref ref-type="table" rid="T4">4</xref>
, and
<xref ref-type="table" rid="T5">5</xref>
.</p>
<table-wrap id="T3" position="float">
<label>Table 3</label>
<caption>
<p>SPARQL query example 1: descriptive statistical analysis of dataset contents</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" colspan="3">SELECT ?neoplasm ?variation (count (?variation) as ?occurrence)</th>
</tr>
<tr>
<th align="left" colspan="3">WHERE {</th>
</tr>
<tr>
<th align="left" colspan="3"> ?sample NCIT:Neoplasm_by_Morphology ?neoplasm.</th>
</tr>
<tr>
<th align="left" colspan="3"> ?somatic_mutation logvd:hasSample ?sample.</th>
</tr>
<tr>
<th align="left" colspan="3"> ?variation_id rdfs:label ?variation.</th>
</tr>
<tr>
<th align="left" colspan="3"> ?somatic_mutation logvd:hasVariation ?variation_id.</th>
</tr>
<tr>
<th align="left" colspan="3">}</th>
</tr>
<tr>
<th align="left" colspan="3">GROUP BY ?neoplasm ?variation</th>
</tr>
<tr>
<th align="left" colspan="3">ORDER BY ?neoplasm</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">
<bold>?neoplasm</bold>
</td>
<td align="left">
<bold>?variation</bold>
</td>
<td align="left">
<bold>?occurrence</bold>
</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Acinar cell carcinoma</td>
<td align="left">NM_000546.1:c.186A>C</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left">Acinar cell carcinoma</td>
<td align="left">NM_000546.1:c.408del1</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left">Acinar cell carcinoma</td>
<td align="left">NM_000546.1:c.454del1</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left">Acinar cell carcinoma</td>
<td align="left">NM_000546.1:c.590T>G</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left">Acute leukemia, NOS</td>
<td align="left">NM_000546.1:c.524G>A</td>
<td align="left">2</td>
</tr>
<tr>
<td align="left">Acute megakaryoblastic leukemia</td>
<td align="left">NM_000546.1:c.605G>T</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left">Acute megakaryoblastic leukemia</td>
<td align="left">NM_000546.1:c.734G>T</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left">Acute monocytic leukemia</td>
<td align="left">NM_000546.1:c.584T>C</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left">Acute myeloid leukemia with maturation</td>
<td align="left">NM_000546.1:c.743G>A</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left">Acute myeloid leukemia with maturation</td>
<td align="left">NM_000546.1:c.862A>T</td>
<td align="left">1</td>
</tr>
<tr>
<td align="left">......</td>
<td align="left">......</td>
<td align="left">......</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>This query selects neoplasm and associated gene variation along with the number of related associations for all somatic mutations in the dataset. The output has been limited to the first 10 results. SPARQL query prefixes are not shown.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap id="T4" position="float">
<label>Table 4</label>
<caption>
<p>SPARQL query example 2: extraction of complementary data from DBpedia</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" colspan="4">SELECT ?sample ?patient ?country ?capital</th>
</tr>
<tr>
<th align="left" colspan="4">WHERE {</th>
</tr>
<tr>
<th align="left" colspan="4"> ?sample logvd:hasIndividual ?patient.</th>
</tr>
<tr>
<th align="left" colspan="4"> ?sample NCIT:Topography "BRAIN".</th>
</tr>
<tr>
<th align="left" colspan="4"> ?patient NCIT:Country ?country</th>
</tr>
<tr>
<th align="left" colspan="4"> SERVICE {?country ?capital}</th>
</tr>
<tr>
<th align="left" colspan="4">}</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">
<bold>?sample</bold>
</td>
<td align="left">
<bold>?patient</bold>
</td>
<td align="left">
<bold>?country</bold>
</td>
<td align="left">
<bold>?capital</bold>
</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">samples/112</td>
<td align="left">individual/112</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">samples/113</td>
<td align="left">individual/113</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">samples/115</td>
<td align="left">individual/115</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">samples/116</td>
<td align="left">individual/116</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">samples/963</td>
<td align="left">individual/963</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">samples/964</td>
<td align="left">individual/964</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">samples/1026</td>
<td align="left">individual/1025</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">samples/1299</td>
<td align="left">individual/1292</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">samples/1300</td>
<td align="left">individual/1293</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">samples/1302</td>
<td align="left">individual/1295</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">samples/1303</td>
<td align="left">individual/1296</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">samples/1739</td>
<td align="left">individual/1728</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">............</td>
<td align="left">............</td>
<td align="left">............</td>
<td align="left">............</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>This query selects countries and capitals from DBpedia for individuals whose samples were used for the detection of somatic mutations. The SERVICE keyword supports the execution of the query among endpoints distributed across the Web. The output has been limited to the first 12 results. SPARQL query prefixes are not shown.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap id="T5" position="float">
<label>Table 5</label>
<caption>
<p>SPARQL query example 3: retrieving clinical trials of interest</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" colspan="3">SELECT DISTINCT ?variation_label ?neoplasm ?clinical_trial</th>
</tr>
<tr>
<th align="left" colspan="3">WHERE {</th>
</tr>
<tr>
<th align="left" colspan="3"> SERVICE {?clinical_trial relontology:hasInclusionCriteria ?umls}.</th>
</tr>
<tr>
<th align="left" colspan="3"> ?sample logvd:Sub_topography "Middle third of esophagus".</th>
</tr>
<tr>
<th align="left" colspan="3"> ?sample NCIT:Neoplasm_by_Morphology ?neoplasm.</th>
</tr>
<tr>
<th align="left" colspan="3"> ?sample logvd:hasUMLS_neoplasm ?umls.</th>
</tr>
<tr>
<th align="left" colspan="3"> ?somatic_mutation logvd:hasSample ?sample.</th>
</tr>
<tr>
<th align="left" colspan="3"> ?variation_id rdfs:label ?variation_label.</th>
</tr>
<tr>
<th align="left" colspan="3"> ?somatic_mutation logvd:hasVariation ?variation_id.</th>
</tr>
<tr>
<th align="left" colspan="3">}</th>
</tr>
<tr>
<th align="left" colspan="3">ORDER BY ?variation_label ?neoplasm</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">
<bold>?variation_label</bold>
</td>
<td align="left">
<bold>?neoplasm</bold>
</td>
<td align="left">
<bold>?clinical_trial</bold>
</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">NM_000546.1:c.507G > A</td>
<td align="left">Adenocarcinoma, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.838A>G</td>
<td align="left">Adenocarcinoma, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.507G>A</td>
<td align="left">Adenocarcinoma, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.838A>G</td>
<td align="left">Adenocarcinoma, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.482C>A</td>
<td align="left">Dysplasia, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.482C>A</td>
<td align="left">Dysplasia, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.482C>A</td>
<td align="left">Dysplasia, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.482C>A</td>
<td align="left">Dysplasia, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.482C>A</td>
<td align="left">Dysplasia, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.469G>T</td>
<td align="left">Hyperplasia, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.469G>T</td>
<td align="left">Hyperplasia, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.469G>T</td>
<td align="left">Hyperplasia, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.469G>T</td>
<td align="left">Hyperplasia, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.422G>A</td>
<td align="left">Squamous cell carcinoma, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.451C>G</td>
<td align="left">Squamous cell carcinoma, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.469G>T</td>
<td align="left">Squamous cell carcinoma, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.474_475ins1</td>
<td align="left">Squamous cell carcinoma, NOS</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">NM_000546.1:c.488A>G</td>
<td align="left">Squamous cell carcinoma, NOS</td>
<td align="left"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>This query selects clinical trials of interest, given a defined sub topography (precise location of the tumor) and shows which variations are involved. The SERVICE keyword supports the execution of the query among endpoints distributed across the Web. The output has been limited to 18 results that were selected with the aim of showing different tumors associated with the given sub topography. SPARQL query prefixes are not shown.</p>
</table-wrap-foot>
</table-wrap>
<p>The first query is a simple example involving only the TP53 mutation data. It selects neoplasm-gene variation associations along with the number of their occurrences in the dataset. This query is essentially equivalent to a standard SQL query in the database and shows how similar queries may be performed in order to achieve the same functionalities of a relational database (see Table
<xref ref-type="table" rid="T3">3</xref>
).</p>
<p>The second and third queries show how to perform federated SPARQL queries across distinct datasets. Query federation is expressed by means of the SERVICE keyword in a SPARQL query. This keyword supports the execution of the query on distributed SPARQL endpoints: it causes a sub-pattern of the query to be sent to a named endpoint, instead of being matched on the local dataset.</p>
<p>The second query demonstrates how to retrieve complementary data from DBpedia. Capital towns of countries included in our dataset are retrieved for individuals whose samples were used for the detection of somatic mutations which are present in the mutation dataset (see Table
<xref ref-type="table" rid="T4">4</xref>
).</p>
<p>The last example query shows how to select information from the Linked Clinical Trials (LinkedCT) dataset which is available via the LinkedLifeData endpoint. Given a defined sub topography (precise anatomical location of the origin of the sample) that is put in relation with clinical trials of interest through the shared adoption of the corresponding ULMS code for inclusion criteria in the trial, information on gene variation, neoplasm and clinical trial ID is shown (see Table
<xref ref-type="table" rid="T5">5</xref>
).</p>
</sec>
<sec>
<title>Some statistics on contents of the triple store</title>
<p>Table
<xref ref-type="table" rid="T6">6</xref>
reports some statistics on contents of our triple store. The number of entities is given by the sum of somatic mutations, gene variations, samples, individuals, and bibliographic references included in the database.</p>
<table-wrap id="T6" position="float">
<label>Table 6</label>
<caption>
<p>Main statistics of triple store contents</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Triple store size</th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Number of entities</td>
<td></td>
<td align="right">85,785</td>
</tr>
<tr>
<td align="left">Number of triples</td>
<td></td>
<td align="right">1,002,597</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>Number of external URIs</bold>
</td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">LinkedLifeData</td>
<td></td>
<td align="right">25,094</td>
</tr>
<tr>
<td align="left">Bio2RDF</td>
<td></td>
<td align="right">2,244</td>
</tr>
<tr>
<td align="left">DBpedia</td>
<td></td>
<td align="right">23,015</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td></td>
<td align="right">Total</td>
<td align="right">50,353</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>Number of links to external web pages</bold>
</td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td></td>
<td align="right">Total</td>
<td align="right">2,436</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>Shared properties from re-used ontologies</bold>
</td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>Ontology</bold>
</td>
<td align="right">
<bold>No. of shared properties</bold>
</td>
<td align="right">
<bold>Involved triples</bold>
</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">rdf:</td>
<td align="right">1</td>
<td align="right">85,893</td>
</tr>
<tr>
<td align="left">rdfs:</td>
<td align="right">3</td>
<td align="right">88,249</td>
</tr>
<tr>
<td align="left">owl:</td>
<td align="right">1</td>
<td align="right">2,241</td>
</tr>
<tr>
<td align="left">diseasome:</td>
<td align="right">2</td>
<td align="right">2</td>
</tr>
<tr>
<td align="left">mio:</td>
<td align="right">2</td>
<td align="right">9,399</td>
</tr>
<tr>
<td align="left">bibo:</td>
<td align="right">6</td>
<td align="right">11,385</td>
</tr>
<tr>
<td align="left">bibtex:</td>
<td align="right">2</td>
<td align="right">4,478</td>
</tr>
<tr>
<td align="left">NCIT:</td>
<td align="right">12</td>
<td align="right">146,553</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Total</td>
<td align="right">29</td>
<td align="right">348,200</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Statistics of the RDF triple store showing number of triples, entities, external Linked Data URIs, links to external web sites and shared properties from re-used ontologies.</p>
</table-wrap-foot>
</table-wrap>
<p>External URIs that specify either Linked Life Data, Bio2RDF or DBpedia entities are included in about 5% of triples. The number of triples including a shared property, i.e. a property that is defined within a re-used ontology, are about one third of the total. It may be noteworthy that there are 9,399 triples including one property from MIO and 146,553 triples with one property from NCIT.</p>
</sec>
<sec>
<title>Availability</title>
<p>Presently, we offer access to our dataset via two distinct modalities. The first interface is a SPARQL endpoint available at: http://bioinformatics.istge.it/logvdsparql/sparql. It is implemented via Joseki and TDB. Interfaces to validators for SPARQL queries and for RDF data are also available. The second interface is a Linked Data representation available at [
<xref ref-type="bibr" rid="B51">51</xref>
]. It is based on the Pubby frontend.</p>
</sec>
</sec>
<sec>
<title>Discussion</title>
<p>Thanks to some tools that were recently developed, an RDF representation of contents of a relational database can now be easily provided. However, having an RDF representation of a data set of gene mutations is not enough to achieve the desirable integration with other data sets. The main value, and difficulty, lies on the identification of a shared, semantically meaningful, ontology-based representation of variation information.</p>
<p>In our mapping, we have expressed mutations as a central entity which connects sequence variations to individuals. Such mapping adopts, whenever possible, consolidated vocabularies for the description of properties (e.g.: age of patients) or types (e.g.: mutation codes). We have also adopted shared URIs when easily identifiable (e.g. Bio2RDF URIs for Entrez, HGNC, OMIM, etc...).</p>
<p>The resulting RDF has been exposed both via RDF and as Linked Data, hence making the system be a part of the growing Web of Data and a potential bridge between molecular and clinical information.</p>
<p>There are still problems in the development of an effective Linked Data solution for mutation data, in particular the need for URIs which univocally identify mutation (or in alternative a computable definition of equivalent URIs). Current nomenclatures can be the basis for RefSeq version specific URIs, which could be complemented by services to identify equivalent mutations.</p>
<p>At the moment, our model is "application oriented", in the sense that it only reflects the schema and contents of the IARC TP53 Mutation database. But this is also, to our knowledge, the first case of a mutation database expressed in the LOD. For this reason it is, at the same time, a first model from which the community of LSDB and mutation database managers can start developing a general ontology model for gene variation and, hopefully, a recognized reference for future efforts in this direction.</p>
<p>In the future, a remapping of our model in the light of the new developments is then foreseen. A revised version of our prototype, including further shared concepts and all data sets provided by the IARC TP53 Database, is already being developed and we also plan to add more variation databases.</p>
<p>More pertinent or exhaustive ontologies related to variation concepts have to be developed in a community effort. At the same time, standardization of nomenclatures and identifiers must be included, when available.</p>
<p>A clear proof of concept of the advancement that a Linked Data representation of mutation data can currently provide with reference to alternative solutions is hard to assess. There still are some issues with the assessment of the value of publishing resources on the Semantic Web or Linked Data.</p>
<p>The advantages that are offered in terms of easiness and accuracy of data integration can be measured only if a cost component is taken into account. Without this, equivalent solutions can be realized with alternative technologies. It is difficult to measure these aspects, other than in very general qualitative terms.</p>
<p>The Semantic Web is still a "fishing expedition", to cite a definition that was commonly used for functional genomics in its early days [
<xref ref-type="bibr" rid="B52">52</xref>
]. As an intrinsically enabling technology, the Semantic Web platform is built to enable new solutions, rather than addressing some specific and measurable use cases.</p>
<p>In this paper we are proposing to adopt this technology to make mutation data available as Linked Data. This technology will unleash its full potential when a sufficient amount of information will be available.</p>
<p>With our work, we are providing the Web of Data with a class of information which has a pivotal role in enabling translational research. In the short term, the exploitation of SPARQL queries on mutation data sets and other biological databases may support some interesting and useful data retrieval presently not possible. In a longer time frame, reasoning on integrated variation data may also support discoveries towards personalized medicine.</p>
<p>To this purpose, however, much work is still needed, due to the relative isolation of variation data sources and lack of standardization both in terminologies and data schema.</p>
</sec>
<sec>
<title>List of abbreviations used</title>
<p>API: Application Programming Interface; D2RQ: Database to RDF Query; GO: Gene Ontology; HGVS: Human Genome Variation Society; HVP: Human Variome Project; IARC: International Agency for Research on Cancer; LLD: Linked Life Data; LOD: Linked Open Data; LOVD: Leiden Open Variation Database; LSDB: Locus Specific Data Base; MIO: Mutation Impact Ontology; NCI: National Cancer Institute; NCIT: NCI Thesaurus; OMIM: Online Mendelian Inheritance in Man; RDB: Relational Database; RDF: Resource Description Framework; SADI: Semantic Automated Discovery and Integration; SO: Sequence Ontology; SNP: Single Nucleotide Polymorphism; SPARQL: SPARQL Protocol and RDF Query Language; URI: Uniform Resource Identifier; VariO: Variation Ontology.</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<title>Authors' contributions</title>
<p>PR conceived the work, participated in its design and implementation, and drafted the manuscript. AZ participated in the design of the work, implemented the prototype, and contributed in drafting the manuscript. AS participated in the design of the work, supervised the implementation of the prototype and contributed in drafting the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional file 1</title>
<p>
<bold>D2RQ mappings</bold>
. Complete list of D2RQ mappings exposing IARC TP53 Mutation database to RDF. The mapping consists in a set of triples which are presented in N3 format.</p>
</caption>
<media xlink:href="1471-2105-13-S4-S7-S1.pdf" mimetype="application" mime-subtype="pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<sec>
<title>Acknowledgements</title>
<p>This work has been partially supported by the Italian Ministry of Health (project RNBIO-Rete Nazionale di Bioinformatica Oncologica) and by the Liguria region (project Liguria eScience).</p>
<p>This article has been published as part of
<italic>BMC Bioinformatics </italic>
Volume 13 Supplement 4, 2012: Italian Society of Bioinformatics (BITS): Annual Meeting 2011. The full contents of the supplement are available online at
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S4">http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S4</ext-link>
.</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="other">
<name>
<surname>Berners-Lee</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Hendler</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lassila</surname>
<given-names>O</given-names>
</name>
<article-title>The semantic web</article-title>
<source>Scientific American</source>
<year>2001</year>
<fpage>34</fpage>
<lpage>43</lpage>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal">
<name>
<surname>Stephens</surname>
<given-names>S</given-names>
</name>
<name>
<surname>LaVigna</surname>
<given-names>D</given-names>
</name>
<name>
<surname>DiLascio</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Luciano</surname>
<given-names>J</given-names>
</name>
<article-title>Aggregation of bioinformatics data using Semantic Web technology</article-title>
<source>Journal of Web Semantics</source>
<year>2006</year>
<volume>4</volume>
<fpage>216</fpage>
<lpage>221</lpage>
<pub-id pub-id-type="doi">10.1016/j.websem.2006.05.004</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<name>
<surname>Dhanapalan</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>JY</given-names>
</name>
<article-title>A case study of integrating protein interaction data using semantic web technology</article-title>
<source>Int J Bioinform Res Appl</source>
<year>2007</year>
<volume>3</volume>
<fpage>286</fpage>
<lpage>302</lpage>
<pub-id pub-id-type="doi">10.1504/IJBRA.2007.015004</pub-id>
<pub-id pub-id-type="pmid">18048193</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<name>
<surname>Ruttenberg</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bug</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Samwald</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bodenreider</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Doherty</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Forsberg</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kashyap</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Kinoshita</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Luciano</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Marshall</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Ogbuji</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Rees</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Stephens</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>GT</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Zaccagnini</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Hongsermeier</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Neumann</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Herman</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Cheung</surname>
<given-names>KH</given-names>
</name>
<article-title>Advancing translational research with the Semantic Web</article-title>
<source>BMC Bioinformatics</source>
<year>2007</year>
<volume>8</volume>
<issue>Suppl 3</issue>
<fpage>S2</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-S3-S2</pub-id>
<pub-id pub-id-type="pmid">17493285</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<name>
<surname>Deus</surname>
<given-names>HF</given-names>
</name>
<name>
<surname>Stanislaus</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Veiga</surname>
<given-names>DF</given-names>
</name>
<name>
<surname>Behrens</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wistuba</surname>
<given-names>II</given-names>
</name>
<name>
<surname>Minna</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Garner</surname>
<given-names>HR</given-names>
</name>
<name>
<surname>Swisher</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>Roth</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Correa</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Broom</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Coombes</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vogel</surname>
<given-names>LH</given-names>
</name>
<name>
<surname>Almeida</surname>
<given-names>JS</given-names>
</name>
<article-title>A Semantic Web management model for integrative biomedical informatics</article-title>
<source>PLoS One</source>
<year>2008</year>
<volume>3</volume>
<fpage>e2946</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0002946</pub-id>
<pub-id pub-id-type="pmid">18698353</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Miles</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Klyne</surname>
<given-names>G</given-names>
</name>
<name>
<surname>White-Cooper</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Shotton</surname>
<given-names>D</given-names>
</name>
<article-title>OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster</article-title>
<source>J Biomed Inform</source>
<year>2010</year>
<volume>43</volume>
<fpage>752</fpage>
<lpage>761</lpage>
<pub-id pub-id-type="doi">10.1016/j.jbi.2010.04.004</pub-id>
<pub-id pub-id-type="pmid">20382263</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<name>
<surname>Bizer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Heath</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Berners-Lee</surname>
<given-names>T</given-names>
</name>
<article-title>Linked Data-The Story So Far</article-title>
<source>International Journal on Semantic Web and Information Systems</source>
<year>2009</year>
<volume>5</volume>
<fpage>1</fpage>
<lpage>22</lpage>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="other">
<article-title>RDF-Semantic Web Standards</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.w3.org/RDF/">http://www.w3.org/RDF/</ext-link>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="other">
<article-title>RdfAndSql-W3C Wiki</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.w3.org/wiki/RdfAndSql">http://www.w3.org/wiki/RdfAndSql</ext-link>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="journal">
<name>
<surname>Belleau</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Nolin</surname>
<given-names>M-A</given-names>
</name>
<name>
<surname>Tourigny</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Rigault</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Morissette</surname>
<given-names>J</given-names>
</name>
<article-title>Bio2RDF: Towards a mashup to build bioinformatics knowledge systems</article-title>
<source>Journal of Biomedical Informatics</source>
<year>2008</year>
<volume>41</volume>
<fpage>706</fpage>
<lpage>716</lpage>
<pub-id pub-id-type="doi">10.1016/j.jbi.2008.03.004</pub-id>
<pub-id pub-id-type="pmid">18472304</pub-id>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="other">
<article-title>SPARQL Query Language for RDF</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.w3.org/TR/rdf-sparql-query/">http://www.w3.org/TR/rdf-sparql-query/</ext-link>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="other">
<article-title>Linked Data-Connect Distributed Data across the Web</article-title>
<ext-link ext-link-type="uri" xlink:href="http://linkeddata.org/">http://linkeddata.org/</ext-link>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="other">
<article-title>The Linking Open Data cloud diagram</article-title>
<ext-link ext-link-type="uri" xlink:href="http://lod-cloud.net/">http://lod-cloud.net/</ext-link>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<collab>The 1000 Genomes Consortium</collab>
<article-title>A map of human genome variation from population scale sequencing</article-title>
<source>Nature</source>
<year>2010</year>
<volume>467</volume>
<fpage>1061</fpage>
<lpage>1073</lpage>
<pub-id pub-id-type="doi">10.1038/nature09534</pub-id>
<pub-id pub-id-type="pmid">20981092</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="other">
<article-title>1000 Genomes</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.1000genomes.org/">http://www.1000genomes.org/</ext-link>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<name>
<surname>Fernald</surname>
<given-names>GH</given-names>
</name>
<name>
<surname>Capriotti</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Daneshjou</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Karczewski</surname>
<given-names>KJ</given-names>
</name>
<name>
<surname>Altman</surname>
<given-names>RB</given-names>
</name>
<article-title>Bioinformatics challenges for personalized medicine</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>1741</fpage>
<lpage>1748</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr295</pub-id>
<pub-id pub-id-type="pmid">21596790</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="journal">
<name>
<surname>Cooper</surname>
<given-names>DN</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J-M</given-names>
</name>
<name>
<surname>Ball</surname>
<given-names>EV</given-names>
</name>
<name>
<surname>Howells</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Mort</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Phillips</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Chuzhanova</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Krawczak</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kehrer-Sawatzki</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Stenson</surname>
<given-names>PD</given-names>
</name>
<article-title>Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics</article-title>
<source>Hum Mutat</source>
<year>2010</year>
<volume>31</volume>
<fpage>631</fpage>
<lpage>655</lpage>
<pub-id pub-id-type="doi">10.1002/humu.21260</pub-id>
<pub-id pub-id-type="pmid">20506564</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<name>
<surname>Laurila</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Naderi</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Witte</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Riazanov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kouznetsov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>CJO</given-names>
</name>
<article-title>Algorithms and semantic infrastructure for mutation impact extraction and grounding</article-title>
<source>BMC Genomics</source>
<year>2010</year>
<volume>11</volume>
<issue>Suppl 4</issue>
<fpage>S24</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-11-S4-S24</pub-id>
<pub-id pub-id-type="pmid">21143808</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="other">
<article-title>Human Genome Variation Society</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.hgvs.org/">http://www.hgvs.org/</ext-link>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="other">
<article-title>Human Genome Variation Society database list</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.hgvs.org/dblist/dblist.html">http://www.hgvs.org/dblist/dblist.html</ext-link>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal">
<name>
<surname>Fokkema</surname>
<given-names>IF</given-names>
</name>
<name>
<surname>Taschner</surname>
<given-names>PE</given-names>
</name>
<name>
<surname>Schaafsma</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>Celli</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Laros</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>den Dunnen</surname>
<given-names>JT</given-names>
</name>
<article-title>LOVD v.2.0: the next generation in gene variant databases</article-title>
<source>Hum Mutat</source>
<year>2011</year>
<volume>32</volume>
<fpage>557</fpage>
<lpage>563</lpage>
<pub-id pub-id-type="doi">10.1002/humu.21438</pub-id>
<pub-id pub-id-type="pmid">21520333</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="other">
<article-title>Human Variome Project</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.humanvariomeproject.org/">http://www.humanvariomeproject.org/</ext-link>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal">
<name>
<surname>den Dunnen</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Sijmons</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Andersen</surname>
<given-names>PS</given-names>
</name>
<name>
<surname>Vihinen</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Beckmann</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Rossetti</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Talbot</surname>
<given-names>CC</given-names>
<suffix>Jr</suffix>
</name>
<name>
<surname>Hardison</surname>
<given-names>RC</given-names>
</name>
<name>
<surname>Povey</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Cotton</surname>
<given-names>RG</given-names>
</name>
<article-title>Sharing data between LSDBs and central repositories</article-title>
<source>Hum Mutat</source>
<year>2009</year>
<volume>30</volume>
<fpage>493</fpage>
<lpage>495</lpage>
<pub-id pub-id-type="doi">10.1002/humu.20977</pub-id>
<pub-id pub-id-type="pmid">19306393</pub-id>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="other">
<article-title>VariO</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.variationontology.org">http://www.variationontology.org</ext-link>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="other">
<article-title>Mutation Impact Ontology-OWL format</article-title>
<ext-link ext-link-type="uri" xlink:href="http://unbsj.biordf.net/ontologies/mutation-impact-ontology.owl">http://unbsj.biordf.net/ontologies/mutation-impact-ontology.owl</ext-link>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<name>
<surname>Wilkinson</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>McCarthy</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Vandervalk</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Withers</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kawas</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Samadian</surname>
<given-names>S</given-names>
</name>
<article-title>SADI, SHARE, and the in silico scientific method</article-title>
<source>BMC Bioinformatics</source>
<year>2010</year>
<volume>11</volume>
<issue>Suppl 12</issue>
<fpage>S7</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-11-S12-S7</pub-id>
<pub-id pub-id-type="pmid">21210986</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal">
<name>
<surname>Riazanov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Laurila</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>CJO</given-names>
</name>
<article-title>Deploying mutation impact text-mining software with the SADI Semantic Web Services framework</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<issue>Suppl 4</issue>
<fpage>S6</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-12-S4-S6</pub-id>
<pub-id pub-id-type="pmid">21992079</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="book">
<name>
<surname>Bada</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Eilbeck</surname>
<given-names>K</given-names>
</name>
<person-group person-group-type="editor">Baker CJO, Witte R, Rebholz-Schuhmann D</person-group>
<article-title>Toward a richer representation of sequence variation in the Sequence Ontology</article-title>
<source>Annotation, Interpretation and Management of Mutations 2010</source>
<publisher-name>Ghent, Belgium</publisher-name>
<comment>Proceedings of the ECCB 2010 Workshop: Annotation, Interpretation and Management of Mutations (AIMM-2010), Ghent, Belgium, September 26, 2010, CEUR Workshop Proceedings, ISSN 1613-0073, online:
<ext-link ext-link-type="uri" xlink:href="http://CEUR-WS.org/Vol-645/">http://CEUR-WS.org/Vol-645/</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="other">
<article-title>D2RQ-Treating Non-RDF Databases as Virtual RDF Graphs-Chris Bizer</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www4.wiwiss.fu-berlin.de/bizer/d2rq/index.htm">http://www4.wiwiss.fu-berlin.de/bizer/d2rq/index.htm</ext-link>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="other">
<article-title>TDB-A SPARQL Database for Jena</article-title>
<ext-link ext-link-type="uri" xlink:href="http://jena.sourceforge.net/TDB/">http://jena.sourceforge.net/TDB/</ext-link>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="other">
<article-title>Jena Semantic Web Framework</article-title>
<ext-link ext-link-type="uri" xlink:href="http://openjena.org/index.html">http://openjena.org/index.html</ext-link>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="other">
<article-title>Joseki-A SPARQL Server for Jena</article-title>
<ext-link ext-link-type="uri" xlink:href="http://joseki.sourceforge.net/">http://joseki.sourceforge.net/</ext-link>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="other">
<article-title>Pubby-A Linked Data Frontend for SPARQL Endpoints</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www4.wiwiss.fu-berlin.de/pubby/">http://www4.wiwiss.fu-berlin.de/pubby/</ext-link>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="other">
<article-title>IARC TP53 DATABASE</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www-p53.iarc.fr/">http://www-p53.iarc.fr/</ext-link>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal">
<name>
<surname>Petitjean</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mathe</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kato</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ishioka</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Tavtigian</surname>
<given-names>SV</given-names>
</name>
<name>
<surname>Hainaut</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Olivier</surname>
<given-names>M</given-names>
</name>
<article-title>Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database</article-title>
<source>Hum Mutat</source>
<year>2007</year>
<volume>28</volume>
<fpage>622</fpage>
<lpage>629</lpage>
<pub-id pub-id-type="doi">10.1002/humu.20495</pub-id>
<pub-id pub-id-type="pmid">17311302</pub-id>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="book">
<name>
<surname>Marra</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Romano</surname>
<given-names>P</given-names>
</name>
<article-title>Integrating mutation data of the TP53 human gene in the bioinformatics network environment</article-title>
<source>Proceedings of the First International Conference on Bioinformatics Research and Development BIRD '07: 12-14 March 2007; Berlin</source>
<year>2007</year>
<publisher-name>Springer Verlag Berlin Heidelberg</publisher-name>
<fpage>453</fpage>
<lpage>463</lpage>
<comment>Springer Lecture Notes in Bioinformatics LNBI 4414</comment>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="other">
<article-title>SRS at the National Cancer Research Institute in Genova</article-title>
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.istge.it/srs71/">http://bioinformatics.istge.it/srs71/</ext-link>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<name>
<surname>Sioutos</surname>
<given-names>N</given-names>
</name>
<name>
<surname>de Coronado</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Haber</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Hartel</surname>
<given-names>FW</given-names>
</name>
<name>
<surname>Shaiud</surname>
<given-names>W-L</given-names>
</name>
<name>
<surname>Wright</surname>
<given-names>LW</given-names>
</name>
<article-title>NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information</article-title>
<source>Journal of Biomedical Informatics</source>
<year>2007</year>
<volume>40</volume>
<fpage>30</fpage>
<lpage>43</lpage>
<pub-id pub-id-type="doi">10.1016/j.jbi.2006.02.013</pub-id>
<pub-id pub-id-type="pmid">16697710</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="other">
<article-title>NCI Thesaurus</article-title>
<ext-link ext-link-type="uri" xlink:href="http://ncit.nci.nih.gov/">http://ncit.nci.nih.gov/</ext-link>
</mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="other">
<article-title>Welcome to the Bibliographic Ontology Website | The Bibliographic Ontology</article-title>
<ext-link ext-link-type="uri" xlink:href="http://bibliontology.com/">http://bibliontology.com/</ext-link>
</mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="other">
<article-title>BibTeX Ontology</article-title>
<ext-link ext-link-type="uri" xlink:href="http://data.bibbase.org/ontology/">http://data.bibbase.org/ontology/</ext-link>
</mixed-citation>
</ref>
<ref id="B42">
<mixed-citation publication-type="journal">
<name>
<surname>Goh</surname>
<given-names>K-I</given-names>
</name>
<name>
<surname>Cusick</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Valle</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Childs</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Vidal</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Barabási</surname>
<given-names>A-L</given-names>
</name>
<article-title>The Human Disease Network</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2007</year>
<volume>104</volume>
<fpage>8685</fpage>
<lpage>8690</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0701361104</pub-id>
<pub-id pub-id-type="pmid">17502601</pub-id>
</mixed-citation>
</ref>
<ref id="B43">
<mixed-citation publication-type="other">
<article-title>Diseasome: explore the human disease network</article-title>
<ext-link ext-link-type="uri" xlink:href="http://diseasome.eu/">http://diseasome.eu/</ext-link>
</mixed-citation>
</ref>
<ref id="B44">
<mixed-citation publication-type="other">
<article-title>dbpedia.org: About</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.dbpedia.org/">http://www.dbpedia.org/</ext-link>
</mixed-citation>
</ref>
<ref id="B45">
<mixed-citation publication-type="other">
<article-title>HUGO Gene Nomenclature Committee Home Page</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.genenames.org/">http://www.genenames.org/</ext-link>
</mixed-citation>
</ref>
<ref id="B46">
<mixed-citation publication-type="other">
<article-title>Linked Life Data-A Semantic Data Integration Platform for the Biomedical Domain</article-title>
<ext-link ext-link-type="uri" xlink:href="http://linkedlifedata.com/">http://linkedlifedata.com/</ext-link>
</mixed-citation>
</ref>
<ref id="B47">
<mixed-citation publication-type="book">
<name>
<surname>Heath</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bizer</surname>
<given-names>C</given-names>
</name>
<source>Linked Data: Evolving the Web into a Global Data Space</source>
<year>2011</year>
<publisher-name>Morgan & Claypool</publisher-name>
<comment>[Hendler J, van Harmelen F (Series Editor)
<italic>Synthesis Lectures on the Semantic Web: Theory and Technology</italic>
, 1:1]</comment>
</mixed-citation>
</ref>
<ref id="B48">
<mixed-citation publication-type="other">
<article-title>Marbles Linked Data Engine</article-title>
<ext-link ext-link-type="uri" xlink:href="http://marbles.sourceforge.net/">http://marbles.sourceforge.net/</ext-link>
</mixed-citation>
</ref>
<ref id="B49">
<mixed-citation publication-type="book">
<name>
<surname>Heim</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hellmann</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lehmann</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lohmann</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Stegemann</surname>
<given-names>T</given-names>
</name>
<person-group person-group-type="editor">Chua T-S, Kompatsiaris Y, Mérialdo B, Haas W</person-group>
<article-title>RelFinder: Revealing Relationships in RDF Knowledge Bases</article-title>
<source>Semantic Multimedia: Proceedings of the 4th International Conference on Semantic and Digital Media Technologies, SAMT 2009 Graz, Austria, December 2-4, 2009</source>
<fpage>182</fpage>
<lpage>187</lpage>
<comment>Lecture Notes in Computer Science LNCS 5887 (Jan 13, 2010) ISBN 978-3642105425</comment>
</mixed-citation>
</ref>
<ref id="B50">
<mixed-citation publication-type="other">
<article-title>TP53/IARC LOGVD SPARQLer-An RDF Query Server-National Cancer Research Institute, Genova, Italy</article-title>
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.istge.it/logvdsparql/">http://bioinformatics.istge.it/logvdsparql/</ext-link>
</mixed-citation>
</ref>
<ref id="B51">
<mixed-citation publication-type="other">
<article-title>TP53/IARC LOGVD Pubby-National Cancer Research Institute, Genova, Italy</article-title>
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.istge.it/logvd/">http://bioinformatics.istge.it/logvd/</ext-link>
</mixed-citation>
</ref>
<ref id="B52">
<mixed-citation publication-type="journal">
<name>
<surname>Vidal</surname>
<given-names>M</given-names>
</name>
<article-title>A biological atlas of functional maps</article-title>
<source>Cell</source>
<year>2001</year>
<volume>104</volume>
<issue>3</issue>
<fpage>333</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1016/S0092-8674(01)00221-5</pub-id>
<pub-id pub-id-type="pmid">11239391</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000420 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000420 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    TelematiV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3303732
   |texte=   Towards linked open gene mutations data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:22536974" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a TelematiV1 

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Thu Nov 2 16:09:04 2017. Site generation: Sun Mar 10 16:42:28 2024