Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Architecture for interoperable software in biology

Identifieur interne : 000510 ( Pmc/Corpus ); précédent : 000509; suivant : 000511

Architecture for interoperable software in biology

Auteurs : James Christopher Bare ; Nitin S. Baliga

Source :

RBID : PMC:4103535

Abstract

Understanding biological complexity demands a combination of high-throughput data and interdisciplinary skills. One way to bring to bear the necessary combination of data types and expertise is by encapsulating domain knowledge in software and composing that software to create a customized data analysis environment. To this end, simple flexible strategies are needed for interconnecting heterogeneous software tools and enabling data exchange between them. Drawing on our own work and that of others, we present several strategies for interoperability and their consequences, in particular, a set of simple data structures—list, matrix, network, table and tuple—that have proven sufficient to achieve a high degree of interoperability. We provide a few guidelines for the development of future software that will function as part of an interoperable community of software tools for biological data analysis and visualization.


Url:
DOI: 10.1093/bib/bbs074
PubMed: 23235920
PubMed Central: 4103535

Links to Exploration step

PMC:4103535

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Architecture for interoperable software in biology</title>
<author>
<name sortKey="Bare, James Christopher" sort="Bare, James Christopher" uniqKey="Bare J" first="James Christopher" last="Bare">James Christopher Bare</name>
</author>
<author>
<name sortKey="Baliga, Nitin S" sort="Baliga, Nitin S" uniqKey="Baliga N" first="Nitin S." last="Baliga">Nitin S. Baliga</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">23235920</idno>
<idno type="pmc">4103535</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103535</idno>
<idno type="RBID">PMC:4103535</idno>
<idno type="doi">10.1093/bib/bbs074</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000510</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Architecture for interoperable software in biology</title>
<author>
<name sortKey="Bare, James Christopher" sort="Bare, James Christopher" uniqKey="Bare J" first="James Christopher" last="Bare">James Christopher Bare</name>
</author>
<author>
<name sortKey="Baliga, Nitin S" sort="Baliga, Nitin S" uniqKey="Baliga N" first="Nitin S." last="Baliga">Nitin S. Baliga</name>
</author>
</analytic>
<series>
<title level="j">Briefings in Bioinformatics</title>
<idno type="ISSN">1467-5463</idno>
<idno type="eISSN">1477-4054</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Understanding biological complexity demands a combination of high-throughput data and interdisciplinary skills. One way to bring to bear the necessary combination of data types and expertise is by encapsulating domain knowledge in software and composing that software to create a customized data analysis environment. To this end, simple flexible strategies are needed for interconnecting heterogeneous software tools and enabling data exchange between them. Drawing on our own work and that of others, we present several strategies for interoperability and their consequences, in particular, a set of simple data structures—list, matrix, network, table and tuple—that have proven sufficient to achieve a high degree of interoperability. We provide a few guidelines for the development of future software that will function as part of an interoperable community of software tools for biological data analysis and visualization.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Stein, Ld" uniqKey="Stein L">LD Stein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stein, Ld" uniqKey="Stein L">LD Stein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hood, L" uniqKey="Hood L">L Hood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Madhavan, J" uniqKey="Madhavan J">J Madhavan</name>
</author>
<author>
<name sortKey="Jeffery, S" uniqKey="Jeffery S">S Jeffery</name>
</author>
<author>
<name sortKey="Cohen, S" uniqKey="Cohen S">S Cohen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bonneau, R" uniqKey="Bonneau R">R Bonneau</name>
</author>
<author>
<name sortKey="Facciotti, Mt" uniqKey="Facciotti M">MT Facciotti</name>
</author>
<author>
<name sortKey="Reiss, Dj" uniqKey="Reiss D">DJ Reiss</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koide, T" uniqKey="Koide T">T Koide</name>
</author>
<author>
<name sortKey="Reiss, Dj" uniqKey="Reiss D">DJ Reiss</name>
</author>
<author>
<name sortKey="Bare, Jc" uniqKey="Bare J">JC Bare</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yoon, Sh" uniqKey="Yoon S">SH Yoon</name>
</author>
<author>
<name sortKey="Reiss, Dj" uniqKey="Reiss D">DJ Reiss</name>
</author>
<author>
<name sortKey="Bare, Jc" uniqKey="Bare J">JC Bare</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shannon, P" uniqKey="Shannon P">P Shannon</name>
</author>
<author>
<name sortKey="Markiel, A" uniqKey="Markiel A">A Markiel</name>
</author>
<author>
<name sortKey="Ozier, O" uniqKey="Ozier O">O Ozier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shannon, Pt" uniqKey="Shannon P">PT Shannon</name>
</author>
<author>
<name sortKey="Reiss, Dj" uniqKey="Reiss D">DJ Reiss</name>
</author>
<author>
<name sortKey="Bonneau, R" uniqKey="Bonneau R">R Bonneau</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Giardine, B" uniqKey="Giardine B">B Giardine</name>
</author>
<author>
<name sortKey="Riemer, C" uniqKey="Riemer C">C Riemer</name>
</author>
<author>
<name sortKey="Hardison, Rc" uniqKey="Hardison R">RC Hardison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oinn, T" uniqKey="Oinn T">T Oinn</name>
</author>
<author>
<name sortKey="Addis, M" uniqKey="Addis M">M Addis</name>
</author>
<author>
<name sortKey="Ferris, J" uniqKey="Ferris J">J Ferris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hull, D" uniqKey="Hull D">D Hull</name>
</author>
<author>
<name sortKey="Wolstencroft, K" uniqKey="Wolstencroft K">K Wolstencroft</name>
</author>
<author>
<name sortKey="Stevens, R" uniqKey="Stevens R">R Stevens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reich, M" uniqKey="Reich M">M Reich</name>
</author>
<author>
<name sortKey="Liefeld, T" uniqKey="Liefeld T">T Liefeld</name>
</author>
<author>
<name sortKey="Gould, J" uniqKey="Gould J">J Gould</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hucka, M" uniqKey="Hucka M">M Hucka</name>
</author>
<author>
<name sortKey="Finney, A" uniqKey="Finney A">A Finney</name>
</author>
<author>
<name sortKey="Sauro, Hm" uniqKey="Sauro H">HM Sauro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilkinson, Md" uniqKey="Wilkinson M">MD Wilkinson</name>
</author>
<author>
<name sortKey="Links, M" uniqKey="Links M">M Links</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ihaka, R" uniqKey="Ihaka R">R Ihaka</name>
</author>
<author>
<name sortKey="Gentleman, R" uniqKey="Gentleman R">R Gentleman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saeed, Ai" uniqKey="Saeed A">AI Saeed</name>
</author>
<author>
<name sortKey="Sharov, V" uniqKey="Sharov V">V Sharov</name>
</author>
<author>
<name sortKey="White, J" uniqKey="White J">J White</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dennis, G" uniqKey="Dennis G">G Dennis</name>
</author>
<author>
<name sortKey="Sherman, Bt" uniqKey="Sherman B">BT Sherman</name>
</author>
<author>
<name sortKey="Hosack, Da" uniqKey="Hosack D">DA Hosack</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kanehisa, M" uniqKey="Kanehisa M">M Kanehisa</name>
</author>
<author>
<name sortKey="Goto, S" uniqKey="Goto S">S Goto</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, P" uniqKey="Li P">P Li</name>
</author>
<author>
<name sortKey="Castrillo, Ji" uniqKey="Castrillo J">JI Castrillo</name>
</author>
<author>
<name sortKey="Velarde, G" uniqKey="Velarde G">G Velarde</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kuehn, H" uniqKey="Kuehn H">H Kuehn</name>
</author>
<author>
<name sortKey="Liberzon, A" uniqKey="Liberzon A">A Liberzon</name>
</author>
<author>
<name sortKey="Reich, M" uniqKey="Reich M">M Reich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gamma, E" uniqKey="Gamma E">E Gamma</name>
</author>
<author>
<name sortKey="Helm, R" uniqKey="Helm R">R Helm</name>
</author>
<author>
<name sortKey="Johnson, R" uniqKey="Johnson R">R Johnson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mehta, Nr" uniqKey="Mehta N">NR Mehta</name>
</author>
<author>
<name sortKey="Medvidovic, N" uniqKey="Medvidovic N">N Medvidovic</name>
</author>
<author>
<name sortKey="Phadke, S" uniqKey="Phadke S">S Phadke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mehta, Nr" uniqKey="Mehta N">NR Mehta</name>
</author>
<author>
<name sortKey="Medvidovic, N" uniqKey="Medvidovic N">N Medvidovic</name>
</author>
<author>
<name sortKey="Phadke, S" uniqKey="Phadke S">S Phadke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hohpe, G" uniqKey="Hohpe G">G Hohpe</name>
</author>
<author>
<name sortKey="Woolf, B" uniqKey="Woolf B">B Woolf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garlan, D" uniqKey="Garlan D">D Garlan</name>
</author>
<author>
<name sortKey="Shaw, M" uniqKey="Shaw M">M Shaw</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author>
<name sortKey="Ball, Ca" uniqKey="Ball C">CA Ball</name>
</author>
<author>
<name sortKey="Blake, Ja" uniqKey="Blake J">JA Blake</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Demir, E" uniqKey="Demir E">E Demir</name>
</author>
<author>
<name sortKey="Cary, Mp" uniqKey="Cary M">MP Cary</name>
</author>
<author>
<name sortKey="Paley, S" uniqKey="Paley S">S Paley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lord, P" uniqKey="Lord P">P Lord</name>
</author>
<author>
<name sortKey="Bechhofer, S" uniqKey="Bechhofer S">S Bechhofer</name>
</author>
<author>
<name sortKey="Wilkinson, M" uniqKey="Wilkinson M">M Wilkinson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Killcoyne, S" uniqKey="Killcoyne S">S Killcoyne</name>
</author>
<author>
<name sortKey="Carter, Gw" uniqKey="Carter G">GW Carter</name>
</author>
<author>
<name sortKey="Smith, J" uniqKey="Smith J">J Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Borner, K" uniqKey="Borner K">K Börner</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rovira, H" uniqKey="Rovira H">H Rovira</name>
</author>
<author>
<name sortKey="Killcoyne, S" uniqKey="Killcoyne S">S Killcoyne</name>
</author>
<author>
<name sortKey="Shmulevich, I" uniqKey="Shmulevich I">I Shmulevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fielding, Rt" uniqKey="Fielding R">RT Fielding</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goble, C" uniqKey="Goble C">C Goble</name>
</author>
<author>
<name sortKey="Stevens, R" uniqKey="Stevens R">R Stevens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Iersel, Mp" uniqKey="Van Iersel M">MP van Iersel</name>
</author>
<author>
<name sortKey="Pico, Ar" uniqKey="Pico A">AR Pico</name>
</author>
<author>
<name sortKey="Kelder, T" uniqKey="Kelder T">T Kelder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smedley, D" uniqKey="Smedley D">D Smedley</name>
</author>
<author>
<name sortKey="Haider, S" uniqKey="Haider S">S Haider</name>
</author>
<author>
<name sortKey="Ballester, B" uniqKey="Ballester B">B Ballester</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Clark, T" uniqKey="Clark T">T Clark</name>
</author>
<author>
<name sortKey="Martin, S" uniqKey="Martin S">S Martin</name>
</author>
<author>
<name sortKey="Liefeld, T" uniqKey="Liefeld T">T Liefeld</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gentleman, Rc" uniqKey="Gentleman R">RC Gentleman</name>
</author>
<author>
<name sortKey="Carey, Vj" uniqKey="Carey V">VJ Carey</name>
</author>
<author>
<name sortKey="Bates, Dm" uniqKey="Bates D">DM Bates</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bare, Jc" uniqKey="Bare J">JC Bare</name>
</author>
<author>
<name sortKey="Koide, T" uniqKey="Koide T">T Koide</name>
</author>
<author>
<name sortKey="Reiss, Dj" uniqKey="Reiss D">DJ Reiss</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bare, Jc" uniqKey="Bare J">JC Bare</name>
</author>
<author>
<name sortKey="Shannon, Pt" uniqKey="Shannon P">PT Shannon</name>
</author>
<author>
<name sortKey="Schmid, Ak" uniqKey="Schmid A">AK Schmid</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Von Mering, C" uniqKey="Von Mering C">C von Mering</name>
</author>
<author>
<name sortKey="Jensen, Lj" uniqKey="Jensen L">LJ Jensen</name>
</author>
<author>
<name sortKey="Snel, B" uniqKey="Snel B">B Snel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jensen, Lj" uniqKey="Jensen L">LJ Jensen</name>
</author>
<author>
<name sortKey="Kuhn, M" uniqKey="Kuhn M">M Kuhn</name>
</author>
<author>
<name sortKey="Stark, M" uniqKey="Stark M">M Stark</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stein, Ld" uniqKey="Stein L">LD Stein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Raymond, Es" uniqKey="Raymond E">ES Raymond</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Brief Bioinform</journal-id>
<journal-id journal-id-type="iso-abbrev">Brief. Bioinformatics</journal-id>
<journal-id journal-id-type="publisher-id">bib</journal-id>
<journal-id journal-id-type="hwp">bib</journal-id>
<journal-title-group>
<journal-title>Briefings in Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="ppub">1467-5463</issn>
<issn pub-type="epub">1477-4054</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">23235920</article-id>
<article-id pub-id-type="pmc">4103535</article-id>
<article-id pub-id-type="doi">10.1093/bib/bbs074</article-id>
<article-id pub-id-type="publisher-id">bbs074</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Papers</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Architecture for interoperable software in biology</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Bare</surname>
<given-names>James Christopher</given-names>
</name>
<xref ref-type="bio" rid="d35e36">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Baliga</surname>
<given-names>Nitin S.</given-names>
</name>
<xref ref-type="bio" rid="d35e47">*</xref>
</contrib>
</contrib-group>
<author-notes>
<corresp>Corresponding author. Nitin S. Baliga, Professor & Director, Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109. Tel.:
<phone>+206 732 1266</phone>
; Fax:
<fax>+206 732 1299</fax>
; E-mail:
<email>nbaliga@systemsbiology.org</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>7</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>11</day>
<month>12</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>11</day>
<month>12</month>
<year>2012</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>15</volume>
<issue>4</issue>
<fpage>626</fpage>
<lpage>636</lpage>
<history>
<date date-type="received">
<day>28</day>
<month>8</month>
<year>2011</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>10</month>
<year>2011</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author 2012. Published by Oxford University Press.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by-nc/3.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/3.0/">http://creativecommons.org/licenses/by-nc/3.0/</ext-link>
), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Understanding biological complexity demands a combination of high-throughput data and interdisciplinary skills. One way to bring to bear the necessary combination of data types and expertise is by encapsulating domain knowledge in software and composing that software to create a customized data analysis environment. To this end, simple flexible strategies are needed for interconnecting heterogeneous software tools and enabling data exchange between them. Drawing on our own work and that of others, we present several strategies for interoperability and their consequences, in particular, a set of simple data structures—list, matrix, network, table and tuple—that have proven sufficient to achieve a high degree of interoperability. We provide a few guidelines for the development of future software that will function as part of an interoperable community of software tools for biological data analysis and visualization.</p>
</abstract>
<kwd-group>
<kwd>interoperability</kwd>
<kwd>software engineering</kwd>
<kwd>bioinformatics</kwd>
<kwd>integration</kwd>
<kwd>systems biology</kwd>
<kwd>data analysis</kwd>
</kwd-group>
<counts>
<page-count count="1"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>INTRODUCTION</title>
<p>Developers of computational biology software increasingly find their programs functioning as parts of data analysis workflows encompassing a number of heterogeneous tools and different data types. Interoperability between these tools is an important and difficult problem, closely intertwined with the well-known and persistent problem of biological data integration [
<xref rid="bbs074-B1" ref-type="bibr">1</xref>
,
<xref rid="bbs074-B2" ref-type="bibr">2</xref>
]. Building the computing infrastructure to seamlessly navigate and recombine biological data remains a work in progress. But some valuable lessons and perhaps a few general principles can be extracted from progress thus far.</p>
<p>The complexity of biological data arises from the diversity of data types (e.g. genotypes, mRNA/protein levels, protein interactions, epigenetic changes and phenotypes), different platforms for measuring/analyzing the same property (e.g. protein interactions), varying quality of measurements and incompatible systems of identifiers (e.g. gene names). Another complicating factor is the need to capture the relevant metadata describing the context of a sample or an experiment, which is needed to make the transition from a mass of individual experiments to a coherent body of data that can be mined for knowledge. All of these data-associated challenges propagate to software engineering where one-off software is essential to support
<italic>ad hoc</italic>
exploratory data analysis; the challenge in integration arises from the diversity of such software tools that eventually need to be packaged into scripted and repeatable form.</p>
<p>A data-driven approach to biological science depends on collaboration spanning the disciplines of biology, mathematics and statistics, computer science and software engineering [
<xref rid="bbs074-B3" ref-type="bibr">3</xref>
]. One way to bring about this combination of expertise is to encapsulate domain knowledge in software components, which can be dynamically composed into integrated systems. Such components may be heterogeneous in their choices of languages and components, span all levels of engineering maturity and evolve at different rates. The cutting edge of research will always outpace standardization, generating new data and analysis that may not fit into any existing schema. Federating distributed data sources [
<xref rid="bbs074-B4" ref-type="bibr">4</xref>
] and independently developed software into an interoperable suite of tools is a challenge that must be addressed in order to build computing systems equal to the task of turning high-throughput data into an understanding of biology in all its complexity.</p>
<p>Our perspective arises through development of software for analysis and visualization of systems biology data [
<xref rid="bbs074-B5" ref-type="bibr">5–7</xref>
], including early versions of the network visualization tool, Cytoscape [
<xref rid="bbs074-B8" ref-type="bibr">8</xref>
]. Superimposing gene expression data over Cytoscape networks motivated the development of Gaggle [
<xref rid="bbs074-B9" ref-type="bibr">9</xref>
], a message passing framework for integration of bioinformatics software. Similar goals motivated several other systems: Galaxy [
<xref rid="bbs074-B10" ref-type="bibr">10</xref>
], Taverna [
<xref rid="bbs074-B11" ref-type="bibr">11</xref>
,
<xref rid="bbs074-B12" ref-type="bibr">12</xref>
], GenePattern [
<xref rid="bbs074-B13" ref-type="bibr">13</xref>
], Systems Biology Workbench (SBW) [
<xref rid="bbs074-B14" ref-type="bibr">14</xref>
] and BioMoby [
<xref rid="bbs074-B15" ref-type="bibr">15</xref>
]. A common theme that figures prominently into these systems is that of composing separately developed software into suites of tools for the analysis of biological data. Architecting these tools to be interconnected thus becomes a critical step.</p>
<p>We first consider an example data analysis workflow involving several software tools, then present a set of strategies for achieving interoperability. Using these strategies as a means to systematically analyze the interoperability aspects of software architecture provides a few guideposts for the development of future systems.</p>
</sec>
<sec>
<title>DATA ANALYSIS WORKFLOWS</title>
<p>Analysis of gene expression (
<xref ref-type="fig" rid="bbs074-F1">Figure 1</xref>
) is a common use case that serves as an example for the techniques discussed later. The analysis is divided into steps, having potential for numerous variations and implemented in software that transforms data then passes results onward.
<fig id="bbs074-F1" position="float">
<label>Figure 1:</label>
<caption>
<p>A biological data analysis workflow to cluster and characterize gene expression data. A gene expression matrix derived by microarrays or sequencing experiments is clustered (here we use the data exploration tool MeV) producing lists of co-expressed genes, which are then passed to two web resources for further analysis. KEGG takes gene lists and finds relevant metabolic pathways. DAVID computes functional enrichment.</p>
</caption>
<graphic xlink:href="bbs074f1"></graphic>
</fig>
</p>
<p>High-throughput measurement of gene expression can be performed by microarray or, increasingly, by sequencing. The shift from arrays to sequencing is an example of technological change that challenges the ability of research software to adapt. In either case, data undergo specialized processing to derive a gene expression matrix, a 2D grid of numeric data in which each row represents a gene’s expression profile over changing conditions.</p>
<p>Clustering the resulting matrix is a likely next step, identifying sets of genes with similar expression profiles over the course of the experiment, possibly performed using tools like R [
<xref rid="bbs074-B16" ref-type="bibr">16</xref>
] or Multi-experiment Viewer (MeV) [
<xref rid="bbs074-B17" ref-type="bibr">17</xref>
]. Products of co-clustered genes may have related functions or participate in the same metabolic pathways. The functional annotation tool DAVID [
<xref rid="bbs074-B18" ref-type="bibr">18</xref>
] accepts lists of genes and computes functional enrichment, returned in tabular form with links to supporting evidence. Through KEGG [
<xref rid="bbs074-B19" ref-type="bibr">19</xref>
], a list of genes can be submitted as a query returning metabolic pathways represented as a network. Ultimately, the analysis is guided by the design of the experiment, which may seek to connect a disease or environmental stimulus to regulation of specific biological processes. Even this simplified example relies on an impressive array of biological, statistical and algorithmic expertise embedded in interacting software tools.</p>
<p>Similar analyses might run in any of several workflow management systems [
<xref rid="bbs074-B20" ref-type="bibr">20</xref>
,
<xref rid="bbs074-B21" ref-type="bibr">21</xref>
] or be coded into scripting languages. When these tools incorporate mechanisms for packaging and publishing the steps of an analysis, this aids in reproducibility. Like design patterns [
<xref rid="bbs074-B22" ref-type="bibr">22</xref>
], basic templates for data analysis are independent of particular tools or languages tending to be adapted to fit new situations and reused. Regardless of implementation details, the need for different pieces of software to interact and exchange data underscores the importance of interoperability as a primary concern in the architecture of bioinformatics applications.</p>
</sec>
<sec>
<title>STRATEGIES FOR INTEROPERABILITY</title>
<p>A variety of methods have been successfully applied to the problem of building interoperable software systems. APIs, plug-ins, messaging and web services can be seen as variations on the general theme of sharing data and functionality between programs. These strategies overlap and can even, at times, be implemented in terms of one another. Most large software systems employ several of them. They differ in the trade-offs they impose and the degree of separation or sharing between communicating programs and should be selected carefully to yield desired properties.</p>
<p>More complete treatments of these software design strategies are available in the literature on software connectors [
<xref rid="bbs074-B23" ref-type="bibr">23</xref>
,
<xref rid="bbs074-B24" ref-type="bibr">24</xref>
], design patterns [
<xref rid="bbs074-B22" ref-type="bibr">22</xref>
,
<xref rid="bbs074-B25" ref-type="bibr">25</xref>
] and software architecture [
<xref rid="bbs074-B26" ref-type="bibr">26</xref>
]. We will briefly give enough terminology (
<xref ref-type="table" rid="bbs074-T1">Table 1</xref>
) to discuss a few advantages and consequences in further detail.
<table-wrap id="bbs074-T1" position="float">
<label>Table 1:</label>
<caption>
<p>Strategies for interoperability</p>
</caption>
<table frame="hsides" rules="groups">
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">Adapter</td>
<td rowspan="1" colspan="1">A component that translates between incompatible interfaces, protocols or content.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">API</td>
<td rowspan="1" colspan="1">Application programming interface; functionality exposed for use by external components.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Broker (mediator or arbitrator)</td>
<td rowspan="1" colspan="1">An intermediary that coordinates interaction between components, serving as the hub in a hub-and-spokes architecture.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Message passing</td>
<td rowspan="1" colspan="1">Sending data from one process to one or more independent processes.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Plug-in architecture</td>
<td rowspan="1" colspan="1">Run-time integration of separately developed task-specific functionality into a general-purpose host program.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">RPC</td>
<td rowspan="1" colspan="1">Remote procedure call; a style of interaction characterized by synchronous invocation of specific functionality running in another process.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Shared representation</td>
<td rowspan="1" colspan="1">A commonly understood data format accessed by multiple programs; for example, a shared DB, a common file format (FASTA, GFF, SAM & BAM, RDF and ontologies). A message payload or arguments to an API call can also serve as a shared representation.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Streaming</td>
<td rowspan="1" colspan="1">Processing partial data as it arrives without waiting for a complete transmission.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Web services</td>
<td rowspan="1" colspan="1">An API made available over web protocols (HTTP). SOAP and REST are two common styles.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Workflow</td>
<td rowspan="1" colspan="1">A repeatable pattern of data processing and transformation designed by arranging separate software components to carry out distinct steps.</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<sec>
<title>Shared representation</title>
<p>Any mutually understandable data format can serve as a shared representation. If two programs can read and write the same file format, access the same database or send intelligible messages to one another, they can communicate. A common format shared by
<italic>n</italic>
otherwise unrelated programs means that each program must translate between its internal structures and the shared representation, an
<italic>n</italic>
-way translation. This greatly improves on the worst-case scenario where each pair of communicating programs requires its own connector, requiring
<italic>n</italic>
(
<italic>n</italic>
− 1)/2 translators to fully connect all programs.</p>
<p>Relational databases are often used as a point of integration. Programs communicating this way will share a dependency on the database schema, but no dependency on each other. In this case, the shared representation is persistent, as are shared files. Messages, as well as objects passed as arguments to an API call, are transient but still must be understood by both sides of the communication.</p>
<p>Shared representations might be arranged on a continuum of increasing structure with events and flat text files at one end and relational databases and ontologies on the other, trading off simplicity and generality for precision. Complex schemas and semantically defined vocabularies work well where basic concepts have reached some degree of stability, but this is not always a given in research. The degree to which interoperability can be reduced to a syntactic issue rather than a semantic one deserves consideration, as semantically rich formats come at substantial costs in terms of engineering effort, consensus building and learning curves.</p>
<p>Toward the more structured end of this spectrum are Semantic Web formats including Gene Ontology [
<xref rid="bbs074-B27" ref-type="bibr">27</xref>
] and BioPAX [
<xref rid="bbs074-B28" ref-type="bibr">28</xref>
]. The goal of the Semantic Web is to construct universal shared representations, enabling linked structured data to be reused and recombined automatically across application and organizational boundaries [
<xref rid="bbs074-B29" ref-type="bibr">29</xref>
]. For now at least, Semantic Web technologies must coexist and interoperate with structured data in databases, semi-structured data in various flavors and with unstructured data. Ideally, bioinformatics software will accommodate varying degrees of structure, exploiting semantically rich data where available without excluding lower levels of the structure hierarchy.</p>
</sec>
<sec>
<title>Plug-in architecture</title>
<p>Plug-ins are a way to augment a general-purpose tool with specialized functionality without cluttering up the core with functionality pertinent only to a few users. Dependency is one way, with the host having no dependency on its plug-ins. Plug-ins can be distributed separately and are often contributed by third parties. Preserving consistent behavior of an API while core functionality is undergoing rapid development can be challenging. But the host is free to change its internals, as long as the contract of the API is maintained.</p>
<p>The network visualization software Cytoscape has a vibrant community of plug-in developers [
<xref rid="bbs074-B30" ref-type="bibr">30</xref>
]. Plug-ins can access and manipulate the central data structure, the network, which is maintained by the host program. An upcoming version of Cytoscape is based on the OSGi (
<ext-link ext-link-type="uri" xlink:href="http://osgi.org">http://osgi.org</ext-link>
) framework, as is the integrated development environment, Eclipse. OSGi takes plug-in architecture a step further by constructing whole software systems from assemblies of plug-ins. Such exceptionally customizable and reconfigurable tools are well matched to the rapidly evolving complexity of scientific data [
<xref rid="bbs074-B31" ref-type="bibr">31</xref>
].</p>
<p>Like a plug-in API, an embedded scripting environment is an extension mechanism enabling programs to be augmented with new functionality [
<xref rid="bbs074-B32" ref-type="bibr">32</xref>
]. For example the editor, Emacs has a Lisp interpreter at its core. The majority of its text editing functionality is written in this language, as are numerous extensions. Mozilla Firefox also follows this pattern with an embedded JavaScript interpreter for executing third-party code and custom extensions, such as Firegoose.</p>
</sec>
<sec>
<title>Messaging</title>
<p>Message passing loosely couples independent applications running in different processes by exchanging packages of data in a mutually intelligible format. One common messaging pattern is the remote procedure call (RPC) where a request invoking a specific function is answered by a response, but there are many alternatives [
<xref rid="bbs074-B25" ref-type="bibr">25</xref>
]. The asynchronous publish-and-subscribe pattern, for example, is often used to propagate events indicating state change or user interaction.</p>
<p>Message-oriented middleware offers a number of attractive features, but at a cost of increasing complexity. Transactional message queues provide asynchrony and buffering and can be configured to guarantee delivery and message order. Message brokers add sophisticated routing and transformations. While powerful, managing this complicated software infrastructure can quickly become prohibitive to all but experts.</p>
<p>Messaging is not well suited to very large data objects. Serialization, copying and deserialization all become expensive in terms of both performance and memory as data size grows, perhaps beyond a limit of a few tens of megabytes with present technologies. Streaming can eliminate the need for whole documents to be in memory, but sacrifices a degree of simplicity. Alternatively, messages may carry a reference to data rather than the full data itself. The pointer can take the form of a URL or a reference to a shared database. This is efficient, but raises again the problem of shared representation, particularly with the introduction of a dependency on a database schema.</p>
<p>Both Gaggle and SBW are message passing systems with binary protocols. The choice of binary versus textual protocols is a trade-off of efficiency against simplicity.</p>
</sec>
<sec>
<title>Pipes and filters</title>
<p>Pipes-and-filters is an archetypal technique for interoperability. The pipes-and-filters model [
<xref rid="bbs074-B26" ref-type="bibr">26</xref>
], often described as ‘small pieces, loosely joined’, enables workflows to be built up from small programs chained together by streaming. The programs are usually invoked from the command line within a Unix shell and have a single purpose whose behavior is modified by command line switches. Branching is possible, but the tendency is toward linear pipelines. The shared representation is typically text files processed line-by-line.</p>
</sec>
<sec>
<title>Web services</title>
<p>Web services and workflows built on them are widely deployed in biology. Building on web protocols brings many advantages from an interoperability perspective. The uniform interface of Hypertext Transport Protocol (HTTP) connects heterogeneous clients and servers. Text representations such as XML or JSON provide platform neutrality.</p>
<p>Web services are often built alongside browser-based web interfaces and, in modern applications, rich web or desktop clients are built on top of well-defined web service APIs. This layered style of architecture supports both interactive and automated access, serving developers, point-and-click users and scripting-enabled power-users.</p>
<p>A request across the network is a relatively slow operation. The granularity of requests should be selected to keep data sizes reasonable and the number of requests small. A pattern is emerging of hosting shared data resources and running analysis on scalable cloud computing infrastructure accessible through web service APIs.</p>
<p>RESTful JSON-based APIs are becoming standard practice and the value of scripting and interconnecting such services is well known. Facebook’s OpenGraph, for example, is a platform that supports a vibrant ecosystem of third-party apps. OpenGraph is a web service API with many aspects of a plug-in architecture having the social graph as its central data structure. Similar techniques may cross over easily into the biology domain where the interactions of genes, proteins and species also form densely interconnected networks. Repositories of biological data could similarly act as platforms for integration [
<xref rid="bbs074-B33" ref-type="bibr">33</xref>
], able to plug in customized modules encapsulating analysis algorithms, visualizations and connectors to multiple data sources.</p>
<p>REST, or representational state transfer [
<xref rid="bbs074-B34" ref-type="bibr">34</xref>
], is the set of architectural principles underlying HTTP, which incorporates several interoperability strategies. HTTP is an RPC client-server messaging protocol. The client makes requests containing one of a small set of methods, (GET, PUT, POST and DELETE) which are answered by responses from the server. This uniform interface is understood by all web servers. The body of an HTTP message may contain HTML or any of dozens of standard media (MIME) types, including specific formats for images, audio and video. Registration of new media types is an important extension point. Since clients cannot reasonably be expected to support an open-ended variety of media types, browsers offer plug-in mechanisms through which specialized or proprietary media formats can be supported.</p>
<p>One tenet of REST is that information be represented in transit as self-describing messages. The syntax for these documents is typically XML or JSON. Their semantic content and structure may conform to a standard (e.g. HTML) or be application specific. In the next section, we propose a highly general representation of scientific data based on a handful of simple data structures annotated with descriptive metadata.</p>
</sec>
</sec>
<sec>
<title>INTEROPERABLE DATA</title>
<p>Data representation is essential to any method of interoperability, serving as an intermediary between communicating programs with different internal representations. Like the type system for a programming language, an intermediate representation seeks to balance several qualities, among them simplicity, expressivity, universality and efficiency. Our experience with Gaggle suggests a system of interlocking data structures, free of domain specific semantics and general enough to cover a broad range of applications.</p>
<p>These shapes of data—lists, matrices, networks, tables and tuples (nested key/value pairs)—are universal and a capable of representing a variety of biological data types (
<xref ref-type="fig" rid="bbs074-F2">Figure 2</xref>
). They are redundant in the sense that it is possible to represent the same data in a number of ways, but a given biological data type usually fits naturally into one of these structures and will be represented by similar structures in the internals of many software tools. Of course, there are data types that do not fit well into these data structures. Images and sequence, for example, are well served by existing formats and these can be used by reference, as is done in HTTP. As a shared representation, this handful of fundamental data types is sufficient to achieve a surprisingly high degree of interoperability.
<fig id="bbs074-F2" position="float">
<label>Figure 2:</label>
<caption>
<p>The shapes of scientific data. A wide variety of scientific data can be represented by a handful of fundamental data structures. A list might hold protein or gene identifiers. Networks represent regulatory influence, metabolic pathways or protein interactions. Numeric data resides in matrices, for example a gene expression matrix or promoter motif PSSM. The combination of tabular data and matrices could enable ChIP-chip data, tiling array data and genome features to be plotted by location in the genome. A bicluster, a set of genes co-expressed under specific conditions, might be represented by the combination of a list of genes, a list of conditions and a gene expression matrix, tied together in a tuple (hierarchically nested key-value pairs). Tuples may also represent experiment design (metadata about media, environmental variables or patient data).</p>
</caption>
<graphic xlink:href="bbs074f2"></graphic>
</fig>
</p>
<p>Rather than matching complex biological data with equally complex and therefore cumbersome, data standards, we instead control complexity with flexible, generic and composable data types that are purposefully underspecified, letting context fill the gap. In our experience, this strategy works remarkably well at representing biological data, containing the costs in software complexity, remaining amenable to formality where necessary without enforcing it where it is not.</p>
<sec>
<title>The shapes of data</title>
<p>A handful of data structures sufficient to represent a variety of biological data are:
<list list-type="bullet">
<list-item>
<p>
<bold>Lists</bold>
. A list of identifiers is a basic and universal data structure, which might hold gene or protein names, accession numbers, ontology terms or resource URLs.</p>
</list-item>
<list-item>
<p>
<bold>Matrices</bold>
. Libraries for manipulating matrices are fundamental to numerical computation and comprise decades of work, underpinning software like R, MATLAB, and NumPy. In Gaggle, a matrix is a 2-dimensional array of floating point values with labeled rows and columns, although it might be argued that an n-dimensional array would be a better choice. Gene expression, protein abundance and motifs PSSMs can be expressed as matrices.</p>
</list-item>
<list-item>
<p>
<bold>Networks</bold>
. A network, or graph, has nodes connected by edges. Both nodes and edges can have key/value properties attached to them. Protein–protein interaction, gene regulation and metabolic pathways are commonly represented as networks.</p>
</list-item>
<list-item>
<p>
<bold>Tables</bold>
. The basic unit of relational databases is the table, a set of rows conforming to a schema. Tables differ from matrices in that each column in a table may be a distinct type of data, for example numeric, string, or boolean. Often, the first column is an identifier and other columns hold categorical or numerical data pertaining to the identified entity. For example, a gene feature table might have columns for gene name, strand, start and end position, and function.</p>
</list-item>
<list-item>
<p>
<bold>Tuples</bold>
. Sets of key/value pairs are tuples. The keys are strings that label the values. Simple values can be numeric, string or boolean. Values can also be compound objects: lists, matrices, networks, tables, or other tuples. This nesting enables composition and can be used to build up complex data structures, something that should be done with restraint because it creates a dependency on the precise structure. Like XML and JSON, tuples can represent hierarchical data and can also accommodate RDF triples.</p>
</list-item>
</list>
</p>
<p>These data structures avoid specifying biological semantics and are to be interpreted in the context of the receiving application, a concept called semantic flexibility. Semantic concerns are left pragmatically in the hands of the user. These go hand-in-hand with the design of the data analysis workflow and of the experiment itself, activities that are likely to remain largely in human hands for some time to come.</p>
</sec>
<sec>
<title>Joining data</title>
<p>Joining together corroborating lines of evidence enables robust conclusions. An important aspect of this system of data structures is that they can be readily joined together. For example, a list of genes might be used to select rows in a gene expression matrix or nodes in a protein interaction network. Properties pertaining to those genes might be stored in a table or tuple, also keyed by gene name.</p>
<p>Heterogeneous data sets can be related to each other by joining or merging based on common keys. These common keys, or touch points [
<xref rid="bbs074-B35" ref-type="bibr">35</xref>
], take many forms in biological data including gene or protein identifiers, ontology terms and loci. Genome browsers, for example, render visualizations by joining data on the basis of location on the genome.</p>
<p>Inconsistent identifiers impede join operations. In spite of several tools [
<xref rid="bbs074-B18" ref-type="bibr">18</xref>
,
<xref rid="bbs074-B36" ref-type="bibr">36</xref>
,
<xref rid="bbs074-B37" ref-type="bibr">37</xref>
] for mapping between different naming systems, translating identifiers remains a common source of frustration for bioinformatics researchers. Semantic web technologies, including Life Science Identifiers [
<xref rid="bbs074-B38" ref-type="bibr">38</xref>
], seek to create a systematic and universal naming system through the use of Uniform Resource Identifiers (URIs). URIs provide a distributed hierarchical namespace thus avoiding naming conflicts, but do not entirely resolve the issues of multiple names for equivalent entities or semantic mapping across related concepts.</p>
<p>The general idea of joining on a common key might be expanded to include more sophisticated mappings. Sequence similarity links together genes across species enabling propagation of information over the phylogenetic tree. A newly sequenced genome can be essentially joined to the body of existing biological knowledge through BLAST. Likewise, functional enrichment connects sets of genes up the hierarchy to biological processes and metabolic pathways.</p>
</sec>
<sec>
<title>Interoperability example</title>
<p>Returning to our example gene expression analysis, consider the shapes of data crossing the junctions between the software tools. Aligned sequence reads in tabular format or probe intensities are processed into a gene expression matrix. The matrix is transferred to a statistical tool for clustering. Co-clustered genes, as lists of identifiers, may then be intersected with networks denoting protein–protein interactions, metabolic pathways or regulatory networks returning subnetworks. Gene lists may serve as queries to functional databases returning key-value pairs associating a gene with its function. In this light, composing software to perform successive levels of analysis is largely a matter of sharing these fundamental data structures.</p>
</sec>
</sec>
<sec>
<title>GAGGLE: EXPERIENCE AND LESSONS</title>
<p>Much of our experience putting into practice these strategies for interoperability comes from the development and application of Gaggle, a framework designed for interactive exploratory analysis of biological data. Gaggle integrates several in-house and third-party software tools and has been applied in numerous studies, for example [
<xref rid="bbs074-B5" ref-type="bibr">5–7</xref>
].</p>
<p>In terms of interoperability strategies, Gaggle is a message passing system. Shared representation takes the form of the set of data structures discussed earlier, with the omission of tables. Messages are propagated through the Boss, a simple message broker that tracks connected applications and routes messages from one to another. Connections are implemented over Java RMI, implying synchronous RPC with binary serialization of data objects. In client applications, Gaggle connectivity is often implemented through plug-in APIs, as it is in Cytoscape. In order to incorporate web applications into the system, an adapter was needed that could translate from the Java RMI protocol to web protocols, HTTP and XML-based web services. This is the basic function of Firegoose.</p>
<p>The Gaggle framework creates a fluid environment for interactive exploratory analysis of systems biology data, integrating an extensible suite of software tools: MeV [
<xref rid="bbs074-B17" ref-type="bibr">17</xref>
], a graphical tool for clustering, classification and visualization; R and Bioconductor [
<xref rid="bbs074-B39" ref-type="bibr">39</xref>
]; Cytoscape. Several more tools were developed specifically for use within the Gaggle framework, including a data repository incorporating machine-readable descriptions of experiment design; a translator for mapping across identifier systems and a genome browser [
<xref rid="bbs074-B40" ref-type="bibr">40</xref>
]. Firegoose [
<xref rid="bbs074-B41" ref-type="bibr">41</xref>
] connects web applications to the Gaggle framework, exchanging data in either direction between desktop tools and popular web sites such as KEGG [
<xref rid="bbs074-B19" ref-type="bibr">19</xref>
], STRING [
<xref rid="bbs074-B42" ref-type="bibr">42</xref>
,
<xref rid="bbs074-B43" ref-type="bibr">43</xref>
] and DAVID [
<xref rid="bbs074-B18" ref-type="bibr">18</xref>
].</p>
<p>As we gained experience with Gaggle, we discovered aspects which worked well and those where greater flexibility and extensibility were needed. Reviewing some of these design decisions may provide guidance for future work.</p>
<sec>
<title>Tables</title>
<p>Originally, we felt that tables could be adequately represented as matrices or tuples and did not include them in the Gaggle. Of particular concern was the need for communicating programs to agree on the specific contents of the table (the schema). On the other hand, the ubiquity of tables in both spreadsheet-like programs and databases argued for their inclusion. More importantly, significant utility can be had from tables without limiting applications to a prescribed schema. For example, the first column often holds an identifier field. That alone is sufficient for making selections and joining to other data structures. Applications with specific needs might have to look for expected column headers; for example, sequence, strand, start and end indicating a locus on a genome. A program encountering a table with an unfamiliar schema can ignore it, harmlessly. This demonstrates the surprising extent of what can be done without limiting flexibility.</p>
<p>Thus convinced in favor of tables, we faced another problem. Due to the statically typed nature of RMI, Java’s binary protocol for remote method invocation, adding this new data type meant recompiling clients, many of which were developed by third parties. Searching for ways to support both older and newer clients, we prototyped a JSON protocol. Easily enabling additive changes, our experimental protocol showed benefits in generality, extensibility and platform independence at some cost of efficiency.</p>
</sec>
<sec>
<title>Composition</title>
<p>The original Gaggle included a data type for bicluster—a set of genes co-expressed under a set of experimental conditions. The need for this less general data type demonstrated a fundamental omission. A bicluster is just a pair of lists, a list of genes coupled with a list of conditions. This type of composition was made possible by allowing data objects to be nested inside tuples. A bicluster is now represented as a tuple with two keys, ‘genes’ and ‘conditions’, each associated with a list of identifiers. As an added benefit, data objects can now be annotated with metadata. Specifying identifier types or units are among many potential uses.</p>
</sec>
<sec>
<title>Microformats</title>
<p>The original Gaggle worked well for desktop applications; however, many useful and popular bioinformatics resources are available as web applications. This motivated the creation of Firegoose, a browser extension that integrates web resources into the Gaggle framework. To make this easier, small amounts of structured data can be embedded directly into web pages, a technique called microformats or microdata in the HTML5 standard. Search engines and browser plug-ins like Firegoose can parse and act on this embedded information enabling data-aware features. One compelling use is to link together web interfaces and web services, which are often different representations of the same underlying data. Web resources can then participate in a seamless data analysis environment along with desktop tools.</p>
</sec>
<sec>
<title>Scripting</title>
<p>Gaggle is an environment for exploratory analysis, emphasizing the ability to easily move data from one interactive graphical application to another. In this way, Gaggle differs from workflow tools in that the steps of an analysis are determined interactively by the user, rather than being scripted or designed graphically. Gaggle integrates many programs which were not made with scripting in mind, making it difficult to capture and save analysis workflows. Hooks for scripting graphical applications, perhaps even a shared vocabulary of commands, would help ease the transition from interactive exploratory analysis to automated reproducible workflows.</p>
</sec>
</sec>
<sec sec-type="conclusions">
<title>CONCLUSION</title>
<p>‘A “cyberinfrastructure” is a combination of databases, network protocols and computational services that brings people, information and computational tools together to perform science in this information-driven world.’—Lincoln Stein [
<xref rid="bbs074-B44" ref-type="bibr">44</xref>
]</p>
<p>In his landmark dissertation [
<xref rid="bbs074-B34" ref-type="bibr">34</xref>
], Roy Fielding systematically described the architectural principles that enabled the success and ubiquity of the web. Can we formulate similar principles that do the same for bioinformatics? Rather than enforcing restrictive and limiting mandates on data and programming models, the necessary flexibility and interoperability might be achieved through a set of general principles and standard practices, much like it has on the web itself.</p>
<sec>
<title>Requirements</title>
<p>The requirements are relatively clear. The structure of biological research dictates that bioinformatics software be created through a process of distributed independent development with little central coordination. Projects at varying levels of maturity must interoperate while evolving independently at varying rates. Unpredictable new requirements are to be expected. Care should be taken to avoid constraining biological semantics or imposing data models, which may become outdated with advancing biological knowledge.</p>
<p>Bioinformatics software development should remain accessible to scientists and domain experts who are not primarily software engineers. In that spirit, preferred architectures would not restrict choice of programming environment nor require sophisticated development tools or techniques, keeping the barrier to entry as low as possible.
<italic>Ad hoc</italic>
scripts should be supported and allowed to evolve toward greater engineering rigor as warranted. The goal should be tools and practices that enable both exploratory and repeatable data analysis, cultivate collaborative development and produce open easily exchanged data. These are ideals, but taken as guiding principles, they point to engineering decisions that value flexibility and simplicity.</p>
</sec>
<sec>
<title>Principles</title>
<p>‘Rule of Composition: Design programs to be connected to other programs.’—Eric S. Raymond [
<xref rid="bbs074-B45" ref-type="bibr">45</xref>
]</p>
<p>These requirements and values serve to guide the tradeoffs inherent in building complex scientific software. The basic engineering tools for dealing with complexity are abstraction and modularity. Encapsulating specialized functionality behind well-defined interfaces leads to self-contained components. Composition of autonomous components provides the flexibility needed to adapt to unanticipated situations.</p>
<p>Loosely coupled systems are created by carefully limiting dependencies between components. Exchanging data in self-describing documents lowers dependency between components, compared with explicitly invoking another program’s functionality. For example, HTTP constrains the set of actions to a handful of operations (GET, PUT, POST and DELETE), pushing almost all variation into the message payload, which can be any of dozens of defined media types, including text, html, xml, images, audio and video or customized application-specific types. The result is that heterogeneous clients and servers can exchange an unlimited variety of data types. Effective extension points are difficult to craft, but once found, enable existing software to adapt gracefully to new demands.</p>
<p>Generality is a key feature of interoperable data structures. Expressing biological concepts in the data rather than in its structure enables new concepts to be incorporated and existing concepts to change. Likewise, plug-in architecture isolates application-specific semantics, which may be in flux, from universal concepts that apply broadly and consistently. In a rapidly developing field of research, semantics are a vector of change and designing to isolate that change pays off.</p>
<p>The principles of abstraction, modularity, composition, loose coupling, simplicity and generality are simply good software engineering. Applied in context, these principles suggest a set of practices that allow interoperability to proceed naturally. Several existing systems and frameworks exemplify some of the necessary ingredients: composing customized workflows of loosely coupled independently developed software components; protecting against change through generality; promoting extensible or optional standards and keeping software components, protocols and data representations as simple as possible, bearing in mind that specifications that burden developers and data providers with up-front costs are less likely to be adopted. Software tools should be designed for interoperability, anticipating their role as parts of an integrated platform collectively supporting the emergence of yet higher levels as new biology is discovered.</p>
<p>
<boxed-text id="bbs074-BOX1" position="float">
<caption>
<title>Key Points</title>
</caption>
<p>
<list list-type="bullet">
<list-item>
<p>Interoperability is a key feature for scientific software.</p>
</list-item>
<list-item>
<p>Flexible and powerful software environments for scientific data analysis depend on composition of independently developed software tools each encapsulating domain expertise from particular areas of specialty.</p>
</list-item>
<list-item>
<p>A handful of simple generic data structures—lists, matrices, networks, tables and tuples (nested key/value pairs)—are capable of representing a variety of biological data types. These represent the shapes of scientific data and provide a basis for simple and flexible interoperability.</p>
</list-item>
</list>
</p>
</boxed-text>
</p>
</sec>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>The authors would like to thank Alexander Pico, Christopher Plaisier, Hector Rovira, Mike Smoot and Wei-ju Wu for insightful discussions. We also thank the reviewers for their helpful comments and suggestions.</p>
</ack>
<bio id="d35e36">
<p>
<bold>James Christopher Bare</bold>
is a software engineer at Sage Bionetworks. He was formerly at the Institute for Systems Biology helping to develop Gaggle, Firegoose, Gaggle Genome Browser and Network Portal.</p>
</bio>
<bio id="d35e47">
<p>
<bold>Nitin S. Baliga</bold>
is the director of the Institute for Systems Biology, an interdisciplinary institute pioneering the understanding of biological complexity.</p>
</bio>
<sec>
<title>FUNDING</title>
<p>This work was supported by the
<funding-source>U.S. Department of Energy</funding-source>
[award nos
<award-id>DE-FG02-07ER64327</award-id>
and
<award-id>DG-FG02-08ER64685</award-id>
] and by the
<funding-source>National Institute of Health</funding-source>
[award nos
<award-id>P50GM076547</award-id>
and
<award-id>1R01GM077398-01A2</award-id>
] and by the
<funding-source>University of Luxembourg</funding-source>
. The work conducted by ENIGMA was supported by the
<funding-source>Office of Science</funding-source>
,
<funding-source>Office of Biological and Environmental Research</funding-source>
of the U. S. Department of Energy under Contract No. DE-AC02-05CH11231.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="bbs074-B1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stein</surname>
<given-names>LD</given-names>
</name>
</person-group>
<article-title>Creating a bioinformatics nation</article-title>
<source>Nature</source>
<year>2002</year>
<volume>417</volume>
<fpage>119</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="pmid">12000935</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stein</surname>
<given-names>LD</given-names>
</name>
</person-group>
<article-title>Integrating biological databases</article-title>
<source>Nat Rev Genet</source>
<year>2003</year>
<volume>4</volume>
<fpage>337</fpage>
<lpage>45</lpage>
<pub-id pub-id-type="pmid">12728276</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hood</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Systems biology: integrating technology, biology, and computation</article-title>
<source>Mech Ageing Dev</source>
<year>2003</year>
<volume>124</volume>
<fpage>9</fpage>
<lpage>16</lpage>
<pub-id pub-id-type="pmid">12618001</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Madhavan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Jeffery</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Web-scale data integration: You can only afford to pay as you go</article-title>
<source>Proc CIDR</source>
<year>2007</year>
<fpage>342</fpage>
<lpage>50</lpage>
</element-citation>
</ref>
<ref id="bbs074-B5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bonneau</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Facciotti</surname>
<given-names>MT</given-names>
</name>
<name>
<surname>Reiss</surname>
<given-names>DJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A predictive model for transcriptional control of physiology in a free living cell</article-title>
<source>Cell</source>
<year>2007</year>
<volume>131</volume>
<fpage>1354</fpage>
<lpage>65</lpage>
<pub-id pub-id-type="pmid">18160043</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koide</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Reiss</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Bare</surname>
<given-names>JC</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Prevalence of transcription promoters within archaeal operons and coding sequences</article-title>
<source>Mol Syst Biol</source>
<year>2009</year>
<volume>5</volume>
<fpage>285</fpage>
<pub-id pub-id-type="pmid">19536208</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yoon</surname>
<given-names>SH</given-names>
</name>
<name>
<surname>Reiss</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Bare</surname>
<given-names>JC</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Parallel evolution of transcriptome architecture during genome reorganization</article-title>
<source>Genome Res</source>
<year>2011</year>
<volume>21</volume>
<fpage>1892</fpage>
<lpage>904</lpage>
<pub-id pub-id-type="pmid">21750103</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shannon</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Markiel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ozier</surname>
<given-names>O</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Cytoscape: a software environment for integrated models of biomolecular interaction networks</article-title>
<source>Genome Res</source>
<year>2003</year>
<volume>13</volume>
<fpage>2498</fpage>
<lpage>504</lpage>
<pub-id pub-id-type="pmid">14597658</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shannon</surname>
<given-names>PT</given-names>
</name>
<name>
<surname>Reiss</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Bonneau</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The Gaggle: an open-source software system for integrating bioinformatics software and data sources</article-title>
<source>BMC Bioinformatics</source>
<year>2006</year>
<volume>7</volume>
<fpage>176</fpage>
<pub-id pub-id-type="pmid">16569235</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Giardine</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Riemer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hardison</surname>
<given-names>RC</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Galaxy: a platform for interactive large-scale genome analysis</article-title>
<source>Genome Res</source>
<year>2005</year>
<volume>15</volume>
<fpage>1451</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="pmid">16169926</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Oinn</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Addis</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ferris</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Taverna: a tool for the composition and enactment of bioinformatics workflows</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>20</volume>
<fpage>3045</fpage>
<lpage>54</lpage>
<pub-id pub-id-type="pmid">15201187</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hull</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wolstencroft</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Taverna: a tool for building and running workflows of services</article-title>
<source>Nucleic Acids Res</source>
<year>2006</year>
<volume>34</volume>
<fpage>W729</fpage>
<lpage>32</lpage>
<pub-id pub-id-type="pmid">16845108</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reich</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Liefeld</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Gould</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GenePattern 2.0</article-title>
<source>Nat Genet</source>
<year>2006</year>
<volume>38</volume>
<fpage>500</fpage>
<lpage>1</lpage>
<pub-id pub-id-type="pmid">16642009</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hucka</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Finney</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sauro</surname>
<given-names>HM</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The ERATO Systems Biology Workbench: enabling interaction and exchange between software tools for computational biology</article-title>
<source>Pac Symp Biocomput</source>
<year>2002</year>
<fpage>450</fpage>
<lpage>61</lpage>
<pub-id pub-id-type="pmid">11928498</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wilkinson</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Links</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>BioMOBY: an open source biological web services proposal</article-title>
<source>Brief Bioinformatics</source>
<year>2002</year>
<volume>3</volume>
<fpage>331</fpage>
<lpage>41</lpage>
<pub-id pub-id-type="pmid">12511062</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ihaka</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gentleman</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>R: a language for data analysis and graphics</article-title>
<source>J Comput Graph Stat</source>
<year>1996</year>
<volume>5</volume>
<fpage>299</fpage>
<lpage>314</lpage>
</element-citation>
</ref>
<ref id="bbs074-B17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saeed</surname>
<given-names>AI</given-names>
</name>
<name>
<surname>Sharov</surname>
<given-names>V</given-names>
</name>
<name>
<surname>White</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>TM4: a free, open-source system for microarray data management and analysis</article-title>
<source>Biotechniques</source>
<year>2003</year>
<volume>34</volume>
<fpage>374</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="pmid">12613259</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dennis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Sherman</surname>
<given-names>BT</given-names>
</name>
<name>
<surname>Hosack</surname>
<given-names>DA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>DAVID: database for annotation, visualization, and integrated discovery</article-title>
<source>Genome Biol</source>
<year>2003</year>
<volume>4</volume>
<fpage>R60</fpage>
</element-citation>
</ref>
<ref id="bbs074-B19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kanehisa</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Goto</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>KEGG: kyoto encyclopedia of genes and genomes</article-title>
<source>Nucleic Acids Res</source>
<year>2000</year>
<volume>28</volume>
<fpage>27</fpage>
<lpage>30</lpage>
<pub-id pub-id-type="pmid">10592173</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Castrillo</surname>
<given-names>JI</given-names>
</name>
<name>
<surname>Velarde</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<fpage>334</fpage>
<pub-id pub-id-type="pmid">18687127</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kuehn</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Liberzon</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Reich</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Using GenePattern for gene expression analysis</article-title>
<source>Curr Protoc Bioinformatics</source>
<year>2008</year>
<comment>Chapter 7: Unit 7.12</comment>
</element-citation>
</ref>
<ref id="bbs074-B22">
<label>22</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Gamma</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Helm</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<source>Design Patterns: Elements of Reusable Object-Oriented Software</source>
<year>1995</year>
<publisher-name>Addison-Wesley Professional</publisher-name>
</element-citation>
</ref>
<ref id="bbs074-B23">
<label>23</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Mehta</surname>
<given-names>NR</given-names>
</name>
<name>
<surname>Medvidovic</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Phadke</surname>
<given-names>S</given-names>
</name>
</person-group>
<source>Software Architecture: Foundations, Theory, and Practice</source>
<year>2009</year>
<publisher-name>Wiley</publisher-name>
</element-citation>
</ref>
<ref id="bbs074-B24">
<label>24</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Mehta</surname>
<given-names>NR</given-names>
</name>
<name>
<surname>Medvidovic</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Phadke</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Towards a taxonomy of software connectors</article-title>
<source>
<italic>In</italic>
Proceedings of the 22nd International Conference on Software Engineering</source>
<year>2000</year>
<fpage>178</fpage>
<lpage>87</lpage>
</element-citation>
</ref>
<ref id="bbs074-B25">
<label>25</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Hohpe</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Woolf</surname>
<given-names>B</given-names>
</name>
</person-group>
<source>Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions</source>
<year>2003</year>
<publisher-name>Addison-Wesley Professional</publisher-name>
</element-citation>
</ref>
<ref id="bbs074-B26">
<label>26</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Garlan</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Shaw</surname>
<given-names>M</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Ambriola</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Tortora</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>An Introduction to Software Architecture</article-title>
<source>Advances in Software Engineering and Knowledge Engineering, Vol. I</source>
<year>1993</year>
<publisher-name>World Scientific Publishing Company</publisher-name>
</element-citation>
</ref>
<ref id="bbs074-B27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ball</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Blake</surname>
<given-names>JA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gene Ontology: tool for the unification of biology</article-title>
<source>Nat Genet</source>
<year>2000</year>
<volume>25</volume>
<fpage>25</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="pmid">10802651</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Demir</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Cary</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>Paley</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The BioPAX community standard for pathway data sharing</article-title>
<source>Nat Biotechnol</source>
<year>2010</year>
<volume>28</volume>
<fpage>935</fpage>
<lpage>42</lpage>
<pub-id pub-id-type="pmid">20829833</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lord</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Bechhofer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wilkinson</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Applying semantic web services to bioinformatics: experiences gained, lessons learnt</article-title>
<source>
<italic>Semantic Web—ISWC</italic>
2004</source>
<year>2004</year>
<volume>3298</volume>
<fpage>350</fpage>
<lpage>64</lpage>
</element-citation>
</ref>
<ref id="bbs074-B30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Killcoyne</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Carter</surname>
<given-names>GW</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Cytoscape: a community-based framework for network modeling</article-title>
<source>Protein Netw Pathw Anal</source>
<year>2009</year>
<volume>563</volume>
<fpage>219</fpage>
<lpage>39</lpage>
</element-citation>
</ref>
<ref id="bbs074-B31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Börner</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Plug-and-play macroscopes</article-title>
<source>Commun ACM</source>
<year>2011</year>
<volume>54</volume>
<fpage>60</fpage>
<pub-id pub-id-type="pmid">21984822</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B32">
<label>32</label>
<element-citation publication-type="other">
<comment>Technomancy - in which three programming methods are compared.
<ext-link ext-link-type="uri" xlink:href="http://technomancy.us/161">http://technomancy.us/161</ext-link>
. (26 May 2012, date last accessed)</comment>
</element-citation>
</ref>
<ref id="bbs074-B33">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rovira</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Killcoyne</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Shmulevich</surname>
<given-names>I</given-names>
</name>
<etal></etal>
</person-group>
<article-title>An integration architecture designed to deal with the issues of biological scope, scale and complexity</article-title>
<source>Data Integr Life Sci</source>
<year>2010</year>
<volume>6254</volume>
<fpage>179</fpage>
<lpage>91</lpage>
</element-citation>
</ref>
<ref id="bbs074-B34">
<label>34</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Fielding</surname>
<given-names>RT</given-names>
</name>
</person-group>
<source>Architectural Styles and the Design of Network-Based Software Architectures</source>
<year>2000</year>
</element-citation>
</ref>
<ref id="bbs074-B35">
<label>35</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goble</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Stevens</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>State of the nation in data integration for bioinformatics</article-title>
<source>J Biomed Inform</source>
<year>2008</year>
<volume>41</volume>
<fpage>687</fpage>
<lpage>93</lpage>
<pub-id pub-id-type="pmid">18358788</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B36">
<label>36</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>van Iersel</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>Pico</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Kelder</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services</article-title>
<source>BMC Bioinformatics</source>
<year>2010</year>
<volume>11</volume>
<fpage>5</fpage>
<pub-id pub-id-type="pmid">20047655</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B37">
<label>37</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smedley</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Haider</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ballester</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>BioMart—biological queries made easy</article-title>
<source>BMC Genomics</source>
<year>2009</year>
<comment>14;10:22</comment>
</element-citation>
</ref>
<ref id="bbs074-B38">
<label>38</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Clark</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Martin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Liefeld</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Globally distributed object identification for biological knowledgebases</article-title>
<source>Brief Bioinformatics</source>
<year>2004</year>
<volume>5</volume>
<fpage>59</fpage>
<lpage>70</lpage>
<pub-id pub-id-type="pmid">15153306</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B39">
<label>39</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gentleman</surname>
<given-names>RC</given-names>
</name>
<name>
<surname>Carey</surname>
<given-names>VJ</given-names>
</name>
<name>
<surname>Bates</surname>
<given-names>DM</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Bioconductor: open software development for computational biology and bioinformatics</article-title>
<source>Genome Biol</source>
<year>2004</year>
<volume>5</volume>
<fpage>R80</fpage>
<pub-id pub-id-type="pmid">15461798</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B40">
<label>40</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bare</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Koide</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Reiss</surname>
<given-names>DJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Integration and visualization of systems biology data in context of the genome</article-title>
<source>BMC Bioinformatics</source>
<year>2010</year>
<volume>11</volume>
<fpage>382</fpage>
<pub-id pub-id-type="pmid">20642854</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B41">
<label>41</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bare</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Shannon</surname>
<given-names>PT</given-names>
</name>
<name>
<surname>Schmid</surname>
<given-names>AK</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The Firegoose: two-way integration of diverse data from different bioinformatics web resources with desktop applications</article-title>
<source>BMC Bioinformatics</source>
<year>2007</year>
<volume>8</volume>
<fpage>456</fpage>
<pub-id pub-id-type="pmid">18021453</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B42">
<label>42</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>von Mering</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jensen</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Snel</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>STRING: known and predicted protein-protein associations, integrated and transferred across organisms</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<fpage>D433</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="pmid">15608232</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B43">
<label>43</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jensen</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Kuhn</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Stark</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>STRING 8—a global view on proteins and their functional interactions in 630 organisms</article-title>
<source>Nucleic Acids Res</source>
<year>2009</year>
<volume>37</volume>
<fpage>D412</fpage>
<lpage>6</lpage>
<pub-id pub-id-type="pmid">18940858</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B44">
<label>44</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stein</surname>
<given-names>LD</given-names>
</name>
</person-group>
<article-title>Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges</article-title>
<source>Nat Rev Genet</source>
<year>2008</year>
<volume>9</volume>
<fpage>678</fpage>
<lpage>88</lpage>
<pub-id pub-id-type="pmid">18714290</pub-id>
</element-citation>
</ref>
<ref id="bbs074-B45">
<label>45</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Raymond</surname>
<given-names>ES</given-names>
</name>
</person-group>
<source>The Art of Unix Programming</source>
<year>2004</year>
<publisher-name>Addison-Wesley Professional</publisher-name>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000510 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000510 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4103535
   |texte=   Architecture for interoperable software in biology
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:23235920" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024