Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Leveraging the national cyberinfrastructure for biomedical research

Identifieur interne : 000549 ( Pmc/Corpus ); précédent : 000548; suivant : 000550

Leveraging the national cyberinfrastructure for biomedical research

Auteurs : Richard Leduc ; Matthew Vaughn ; John M. Fonner ; Michael Sullivan ; James G. Williams ; Philip D. Blood ; James Taylor ; William Barnett

Source :

RBID : PMC:3932465

Abstract

In the USA, the national cyberinfrastructure refers to a system of research supercomputer and other IT facilities and the high speed networks that connect them. These resources have been heavily leveraged by scientists in disciplines such as high energy physics, astronomy, and climatology, but until recently they have been little used by biomedical researchers. We suggest that many of the ‘Big Data’ challenges facing the medical informatics community can be efficiently handled using national-scale cyberinfrastructure. Resources such as the Extreme Science and Discovery Environment, the Open Science Grid, and Internet2 provide economical and proven infrastructures for Big Data challenges, but these resources can be difficult to approach. Specialized web portals, support centers, and virtual organizations can be constructed on these resources to meet defined computational challenges, specifically for genomics. We provide examples of how this has been done in basic biology as an illustration for the biomedical informatics community.


Url:
DOI: 10.1136/amiajnl-2013-002059
PubMed: 23964072
PubMed Central: 3932465

Links to Exploration step

PMC:3932465

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Leveraging the national cyberinfrastructure for biomedical research</title>
<author>
<name sortKey="Leduc, Richard" sort="Leduc, Richard" uniqKey="Leduc R" first="Richard" last="Leduc">Richard Leduc</name>
<affiliation>
<nlm:aff id="af1">
<institution>National Center for Genome Analysis Support</institution>
,
<addr-line>Indiana University, Bloomington, Indiana</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Vaughn, Matthew" sort="Vaughn, Matthew" uniqKey="Vaughn M" first="Matthew" last="Vaughn">Matthew Vaughn</name>
<affiliation>
<nlm:aff id="af2">
<addr-line>Life Sciences Computing</addr-line>
,
<institution>Texas Advanced Computing Center</institution>
,
<addr-line>Austin, Texas</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fonner, John M" sort="Fonner, John M" uniqKey="Fonner J" first="John M" last="Fonner">John M. Fonner</name>
<affiliation>
<nlm:aff id="af2">
<addr-line>Life Sciences Computing</addr-line>
,
<institution>Texas Advanced Computing Center</institution>
,
<addr-line>Austin, Texas</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sullivan, Michael" sort="Sullivan, Michael" uniqKey="Sullivan M" first="Michael" last="Sullivan">Michael Sullivan</name>
<affiliation>
<nlm:aff id="af3">
<addr-line>Health Sciences</addr-line>
,
<institution>Internet2</institution>
,
<addr-line>Washington, DC</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Williams, James G" sort="Williams, James G" uniqKey="Williams J" first="James G" last="Williams">James G. Williams</name>
<affiliation>
<nlm:aff id="af4">
<addr-line>International Networking</addr-line>
,
<institution>Indiana University</institution>
,
<addr-line>Bloomington, Indiana</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Blood, Philip D" sort="Blood, Philip D" uniqKey="Blood P" first="Philip D" last="Blood">Philip D. Blood</name>
<affiliation>
<nlm:aff id="af5">
<institution>Pittsburgh Supercomputing Center</institution>
,
<addr-line>Carnegie Mellon University, Pittsburgh, Pennsylvania</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Taylor, James" sort="Taylor, James" uniqKey="Taylor J" first="James" last="Taylor">James Taylor</name>
<affiliation>
<nlm:aff id="af6">
<addr-line>Department of Biology and Department of Mathematics and Computer Science</addr-line>
,
<institution>Emory University</institution>
,
<addr-line>Atlanta, Georgia</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Barnett, William" sort="Barnett, William" uniqKey="Barnett W" first="William" last="Barnett">William Barnett</name>
<affiliation>
<nlm:aff id="af7">
<institution>National Center for Genome Analysis Support, Open Science Grid, Grid Operations Center, Indiana University</institution>
,
<addr-line>Bloomington, Indiana</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">23964072</idno>
<idno type="pmc">3932465</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3932465</idno>
<idno type="RBID">PMC:3932465</idno>
<idno type="doi">10.1136/amiajnl-2013-002059</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000549</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Leveraging the national cyberinfrastructure for biomedical research</title>
<author>
<name sortKey="Leduc, Richard" sort="Leduc, Richard" uniqKey="Leduc R" first="Richard" last="Leduc">Richard Leduc</name>
<affiliation>
<nlm:aff id="af1">
<institution>National Center for Genome Analysis Support</institution>
,
<addr-line>Indiana University, Bloomington, Indiana</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Vaughn, Matthew" sort="Vaughn, Matthew" uniqKey="Vaughn M" first="Matthew" last="Vaughn">Matthew Vaughn</name>
<affiliation>
<nlm:aff id="af2">
<addr-line>Life Sciences Computing</addr-line>
,
<institution>Texas Advanced Computing Center</institution>
,
<addr-line>Austin, Texas</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fonner, John M" sort="Fonner, John M" uniqKey="Fonner J" first="John M" last="Fonner">John M. Fonner</name>
<affiliation>
<nlm:aff id="af2">
<addr-line>Life Sciences Computing</addr-line>
,
<institution>Texas Advanced Computing Center</institution>
,
<addr-line>Austin, Texas</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sullivan, Michael" sort="Sullivan, Michael" uniqKey="Sullivan M" first="Michael" last="Sullivan">Michael Sullivan</name>
<affiliation>
<nlm:aff id="af3">
<addr-line>Health Sciences</addr-line>
,
<institution>Internet2</institution>
,
<addr-line>Washington, DC</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Williams, James G" sort="Williams, James G" uniqKey="Williams J" first="James G" last="Williams">James G. Williams</name>
<affiliation>
<nlm:aff id="af4">
<addr-line>International Networking</addr-line>
,
<institution>Indiana University</institution>
,
<addr-line>Bloomington, Indiana</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Blood, Philip D" sort="Blood, Philip D" uniqKey="Blood P" first="Philip D" last="Blood">Philip D. Blood</name>
<affiliation>
<nlm:aff id="af5">
<institution>Pittsburgh Supercomputing Center</institution>
,
<addr-line>Carnegie Mellon University, Pittsburgh, Pennsylvania</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Taylor, James" sort="Taylor, James" uniqKey="Taylor J" first="James" last="Taylor">James Taylor</name>
<affiliation>
<nlm:aff id="af6">
<addr-line>Department of Biology and Department of Mathematics and Computer Science</addr-line>
,
<institution>Emory University</institution>
,
<addr-line>Atlanta, Georgia</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Barnett, William" sort="Barnett, William" uniqKey="Barnett W" first="William" last="Barnett">William Barnett</name>
<affiliation>
<nlm:aff id="af7">
<institution>National Center for Genome Analysis Support, Open Science Grid, Grid Operations Center, Indiana University</institution>
,
<addr-line>Bloomington, Indiana</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of the American Medical Informatics Association : JAMIA</title>
<idno type="ISSN">1067-5027</idno>
<idno type="eISSN">1527-974X</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>In the USA, the national cyberinfrastructure refers to a system of research supercomputer and other IT facilities and the high speed networks that connect them. These resources have been heavily leveraged by scientists in disciplines such as high energy physics, astronomy, and climatology, but until recently they have been little used by biomedical researchers. We suggest that many of the ‘
<italic>Big Data</italic>
’ challenges facing the medical informatics community can be efficiently handled using national-scale cyberinfrastructure. Resources such as the Extreme Science and Discovery Environment, the Open Science Grid, and Internet2 provide economical and proven infrastructures for
<italic>Big Data</italic>
challenges, but these resources can be difficult to approach. Specialized web portals, support centers, and virtual organizations can be constructed on these resources to meet defined computational challenges, specifically for genomics. We provide examples of how this has been done in basic biology as an illustration for the biomedical informatics community.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Stewart, Ca" uniqKey="Stewart C">CA Stewart</name>
</author>
<author>
<name sortKey="Simms, S" uniqKey="Simms S">S Simms</name>
</author>
<author>
<name sortKey="Plale, B" uniqKey="Plale B">B Plale</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goff, Ss" uniqKey="Goff S">SS Goff</name>
</author>
<author>
<name sortKey="Vaughn, M" uniqKey="Vaughn M">M Vaughn</name>
</author>
<author>
<name sortKey="Mckay, S" uniqKey="Mckay S">S McKay</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leduc, R" uniqKey="Leduc R">R LeDuc</name>
</author>
<author>
<name sortKey="Wu, L S" uniqKey="Wu L">L-S Wu</name>
</author>
<author>
<name sortKey="Ganote, C" uniqKey="Ganote C">C Ganote</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lenards, A" uniqKey="Lenards A">A Lenards</name>
</author>
<author>
<name sortKey="Merchant, N" uniqKey="Merchant N">N Merchant</name>
</author>
<author>
<name sortKey="Stanzione, D" uniqKey="Stanzione D">D Stanzione</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Skidmore, E" uniqKey="Skidmore E">E Skidmore</name>
</author>
<author>
<name sortKey="Kim, S J" uniqKey="Kim S">S-J Kim</name>
</author>
<author>
<name sortKey="Kuchimanchi, S" uniqKey="Kuchimanchi S">S Kuchimanchi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goecks, J" uniqKey="Goecks J">J Goecks</name>
</author>
<author>
<name sortKey="Nekrutenko, A" uniqKey="Nekrutenko A">A Nekrutenko</name>
</author>
<author>
<name sortKey="Taylor, J" uniqKey="Taylor J">J Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blankenberg, D" uniqKey="Blankenberg D">D Blankenberg</name>
</author>
<author>
<name sortKey="Kuster, G" uniqKey="Kuster G">G Kuster</name>
</author>
<author>
<name sortKey="Coraor, N" uniqKey="Coraor N">N Coraor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Giardine, B" uniqKey="Giardine B">B Giardine</name>
</author>
<author>
<name sortKey="Riemer, C" uniqKey="Riemer C">C Riemer</name>
</author>
<author>
<name sortKey="Hardison, R" uniqKey="Hardison R">R Hardison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nowoczynski, P" uniqKey="Nowoczynski P">P Nowoczynski</name>
</author>
<author>
<name sortKey="Sommerfield, J" uniqKey="Sommerfield J">J Sommerfield</name>
</author>
<author>
<name sortKey="Yanovich, J" uniqKey="Yanovich J">J Yanovich</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">J Am Med Inform Assoc</journal-id>
<journal-id journal-id-type="iso-abbrev">J Am Med Inform Assoc</journal-id>
<journal-id journal-id-type="hwp">amiajnl</journal-id>
<journal-id journal-id-type="publisher-id">jamia</journal-id>
<journal-title-group>
<journal-title>Journal of the American Medical Informatics Association : JAMIA</journal-title>
</journal-title-group>
<issn pub-type="ppub">1067-5027</issn>
<issn pub-type="epub">1527-974X</issn>
<publisher>
<publisher-name>BMJ Publishing Group</publisher-name>
<publisher-loc>BMA House, Tavistock Square, London, WC1H 9JR</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">23964072</article-id>
<article-id pub-id-type="pmc">3932465</article-id>
<article-id pub-id-type="publisher-id">amiajnl-2013-002059</article-id>
<article-id pub-id-type="doi">10.1136/amiajnl-2013-002059</article-id>
<article-categories>
<subj-group subj-group-type="hwp-journal-coll">
<subject>1506</subject>
</subj-group>
<subj-group subj-group-type="heading">
<subject>Perspective</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Leveraging the national cyberinfrastructure for biomedical research</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>LeDuc</surname>
<given-names>Richard</given-names>
</name>
<xref ref-type="aff" rid="af1">1</xref>
<contrib-id contrib-id-type="orcid">http://orcid.org/0000-0002-6951-2923</contrib-id>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Vaughn</surname>
<given-names>Matthew</given-names>
</name>
<xref ref-type="aff" rid="af2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fonner</surname>
<given-names>John M</given-names>
</name>
<xref ref-type="aff" rid="af2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Sullivan</surname>
<given-names>Michael</given-names>
</name>
<xref ref-type="aff" rid="af3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Williams</surname>
<given-names>James G</given-names>
</name>
<xref ref-type="aff" rid="af4">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Blood</surname>
<given-names>Philip D</given-names>
</name>
<xref ref-type="aff" rid="af5">5</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Taylor</surname>
<given-names>James</given-names>
</name>
<xref ref-type="aff" rid="af6">6</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Barnett</surname>
<given-names>William</given-names>
</name>
<xref ref-type="aff" rid="af7">7</xref>
</contrib>
</contrib-group>
<aff id="af1">
<label>1</label>
<institution>National Center for Genome Analysis Support</institution>
,
<addr-line>Indiana University, Bloomington, Indiana</addr-line>
,
<country>USA</country>
</aff>
<aff id="af2">
<label>2</label>
<addr-line>Life Sciences Computing</addr-line>
,
<institution>Texas Advanced Computing Center</institution>
,
<addr-line>Austin, Texas</addr-line>
,
<country>USA</country>
</aff>
<aff id="af3">
<label>3</label>
<addr-line>Health Sciences</addr-line>
,
<institution>Internet2</institution>
,
<addr-line>Washington, DC</addr-line>
,
<country>USA</country>
</aff>
<aff id="af4">
<label>4</label>
<addr-line>International Networking</addr-line>
,
<institution>Indiana University</institution>
,
<addr-line>Bloomington, Indiana</addr-line>
,
<country>USA</country>
</aff>
<aff id="af5">
<label>5</label>
<institution>Pittsburgh Supercomputing Center</institution>
,
<addr-line>Carnegie Mellon University, Pittsburgh, Pennsylvania</addr-line>
,
<country>USA</country>
</aff>
<aff id="af6">
<label>6</label>
<addr-line>Department of Biology and Department of Mathematics and Computer Science</addr-line>
,
<institution>Emory University</institution>
,
<addr-line>Atlanta, Georgia</addr-line>
,
<country>USA</country>
</aff>
<aff id="af7">
<label>7</label>
<institution>National Center for Genome Analysis Support, Open Science Grid, Grid Operations Center, Indiana University</institution>
,
<addr-line>Bloomington, Indiana</addr-line>
,
<country>USA</country>
</aff>
<author-notes>
<corresp>
<label>Correspondence to</label>
Dr Richard LeDuc, National Center for Genome Analysis Support, 2709 E. 10th Street, Bloomington, IN 47408, USA;
<email>rleduc@iu.edu</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>3</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>20</day>
<month>8</month>
<year>2013</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>20</day>
<month>8</month>
<year>2013</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>21</volume>
<issue>2</issue>
<fpage>195</fpage>
<lpage>199</lpage>
<history>
<date date-type="received">
<day>31</day>
<month>5</month>
<year>2013</year>
</date>
<date date-type="rev-recd">
<day>15</day>
<month>7</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>3</day>
<month>8</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions</copyright-statement>
<copyright-year>2014</copyright-year>
<license license-type="open-access">
<license-p>This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/3.0/">http://creativecommons.org/licenses/by-nc/3.0/</ext-link>
</license-p>
</license>
</permissions>
<self-uri xlink:title="pdf" xlink:type="simple" xlink:href="amiajnl-2013-002059.pdf"></self-uri>
<abstract>
<p>In the USA, the national cyberinfrastructure refers to a system of research supercomputer and other IT facilities and the high speed networks that connect them. These resources have been heavily leveraged by scientists in disciplines such as high energy physics, astronomy, and climatology, but until recently they have been little used by biomedical researchers. We suggest that many of the ‘
<italic>Big Data</italic>
’ challenges facing the medical informatics community can be efficiently handled using national-scale cyberinfrastructure. Resources such as the Extreme Science and Discovery Environment, the Open Science Grid, and Internet2 provide economical and proven infrastructures for
<italic>Big Data</italic>
challenges, but these resources can be difficult to approach. Specialized web portals, support centers, and virtual organizations can be constructed on these resources to meet defined computational challenges, specifically for genomics. We provide examples of how this has been done in basic biology as an illustration for the biomedical informatics community.</p>
</abstract>
<kwd-group>
<kwd>National Cyberinfrastructure</kwd>
<kwd>Big Data</kwd>
<kwd>Genomics</kwd>
</kwd-group>
<custom-meta-group>
<custom-meta>
<meta-name>special-feature</meta-name>
<meta-value>unlocked</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>There are daily announcements of new meetings, journals, or commercial solutions to the ‘
<italic>Big Data</italic>
’ problems facing biomedical research, yet biomedicine is not the first discipline to face these challenges. In the USA, there is an existing system of research supercomputer centers and high speed research networks collectively known as the national cyberinfrastructure (CI).
<xref rid="R1" ref-type="bibr">1</xref>
These resources have been used by practically all scientific disciplines, but the majority of computation has come from physical and earth scientists, and more recently biologists; the CI has been little used by biomedical researchers. Yet many of the computational challenges facing medical informatics, particularly regarding ‘discovery ‘omics’ such as genomics, can be economically solved using the national CI! Further, problems related to data movement, data management, and software optimization benefit by tapping the computational expertise of the scientists and engineers associated with the national CI.</p>
<p>The national cyberinfrastructure in the USA is not a single entity. Rather, the term describes an assortment of large-scale computational resources available to the scientific community.
<xref ref-type="fig" rid="AMIAJNL2013002059F1">Figure 1</xref>
is a stylized view showing the interrelation of some components of the CI. The core of the CI is a series of independently funded supercomputer centers. The National Science Foundation supports the Extreme Science and Engineering Discovery Environment (XSEDE), a project that brings together dozens of supercomputers and other high-performance computing (HPC) resources from 13 of these supercomputing centers, to advance science and engineering research and education across the USA.
<xref rid="R2" ref-type="bibr">2</xref>
XSEDE provides a framework that supports the development of next generation
<italic>Big Data</italic>
analysis capabilities. Like XSEDE, the Open Science Grid
<xref rid="R3" ref-type="bibr">3</xref>
(OSG) uses advanced networks like Internet2 and National Lambda Rail to provide access to CI computers and clusters. Service organizations, such as iPlant
<xref rid="R4" ref-type="bibr">4</xref>
and the National Center for Genome Analysis Support (NCGAS),
<xref rid="R5" ref-type="bibr">5</xref>
use the resources of XSEDE and OSG to create unified user experiences for different research communities.</p>
<fig id="AMIAJNL2013002059F1" position="float">
<label>Figure 1</label>
<caption>
<p>A Roadmap to the Research Information Superhighway: over 200 supercomputer centers are interconnected across a series of high speed physical networks. The resources in these centers are shared across organizations such as XSEDE and OSG. Specialized centers use XSEDE and OSG to support specialized user communities.</p>
</caption>
<graphic xlink:href="amiajnl-2013-002059f01"></graphic>
</fig>
</sec>
<sec id="s2">
<title>CI support for genomics</title>
<p>Many life sciences are already harnessing the national CI, and we feel that three examples will help demonstrate what is currently possible, and perhaps help stimulate ideas for collaborations between the biomedical and medical informatics communities and the national CI.</p>
<p>The iPlant Collaborative is working to enrich plant and animal sciences through the development of cyberinfrastructure—the physical computing resources, collaborative environment, virtual machine resources, and interoperable analysis software and data services—that are essential components of modern biology.
<xref rid="R6" ref-type="bibr">6</xref>
It is a community-driven effort bringing together plant biologists, bioinformaticians, computational scientists, and HPC professionals to address grand challenges in plant and animal sciences.</p>
<p>Currently in its sixth year, iPlant has over 9000 users and has supported over 8 million core hours per year, and securely houses 500 terabytes of user data. Further, iPlant has developed and deployed an extensive set of tools for scientific research, educating future scientists, and powering other web portals. The iPlant Discovery Environment is a web gateway providing a consistent, intuitive graphical user interface to over 350 community software packages across five XSEDE HPC systems.
<xref rid="R6" ref-type="bibr">6</xref>
Atmosphere, iPlant's configurable cloud computing environment, currently supports around 100 concurrent virtual machines and serves over 2000 users.
<xref rid="R7" ref-type="bibr">7</xref>
The DNA Subway, an online suite of tools for students and educators, is designed to demonstrate the fundamentals of genome analysis through hands-on exploration, and over 615 participants have used it within iPlant workshops. iPlant has educated over 600 users through workshops and tutorials, and actively collaborates with other organizations. Researchers interested in iPlant can visit
<ext-link ext-link-type="uri" xlink:href="http://www.iplantcollaborative.org/">http://www.iplantcollaborative.org/</ext-link>
for additional information.</p>
<p>NCGAS is an NSF funded collaboration between four computing facilities; Indiana University, the Texas Advanced Computing Center (TACC), the San Diego Supercomputer Center (SDSC), and the Pittsburgh Supercomputing Center (PSC).
<xref rid="R5" ref-type="bibr">5</xref>
Founded with a $1.5 million initial award, NCGAS represents how the supercomputing centers are evolving to meet the needs of the life sciences. The center supplies bioinformatic support for genomics projects, particularly projects requiring large memory computation. NCGAS provides consulting services for biologists, assistance in running genome analyses, hardened and optimized genome analysis software, and supercomputing. NCGAS has a simple allocations process that gives researchers access to both its large-memory cluster as well as computing time from its XSEDE allocation. NCGAS's large-RAM cluster, with its 8.0 terabytes of aggregate RAM and associated 5 petabytes of high speed storage on the Data Capacitor parallel disk system, employs best practices security that is consistent with the technical requirements of regulations such as the Health Insurance Portability and Accountably Act, and is designed primarily for de novo sequence assembly.</p>
<p>In its first 18 months, NCGAS supported over 37 NSF-funded genomics projects from 16 states. Being housed across four supercomputing centers, NCGAS represents a model of a virtual genomic analysis facility that operates at a national scale. Each center provides specific expertise and resources, and they are all interconnected by a shared wide area file system that enables transparent workflows across all four sites. Transferring a terabyte of Next Generation Sequencing data generated in Hawaii into the shared file system in Indiana is a trivial problem for teams accustomed to moving petabytes of data over those distances. Researchers considering NCGAS resources can visit
<ext-link ext-link-type="uri" xlink:href="http://ncgas.org">http://ncgas.org</ext-link>
for allocation information.</p>
<p>Galaxy is one of the most popular portals for genomics analysis, having seen exponential growth to approximately 35 000 users since 2005.
<xref rid="R8" ref-type="bibr">8–10</xref>
Widely available access to high-throughput sequencing has dramatically increased the interest in and the size and complexity of genomic analyses. Thus, despite ever increasing growth in registered users, the compute capacity of Galaxy has become saturated due to compute and storage constraints, resulting in unrealized potential for scientific discovery (
<xref ref-type="fig" rid="AMIAJNL2013002059F2">figure 2</xref>
). Galaxy turned to PSC for support in addressing these constraints. PSC had already developed the Data Supercell
<xref rid="R11" ref-type="bibr">11</xref>
(DSC) to help meet the challenges of large-scale, distributed data access and analysis. The DSC is a scalable, disk-based data archival system that is as cost-effective as tape archives but offers many times better bandwidth and orders of magnitude better latency than tape. The DSC uses the SLASH2 file system developed at PSC, and provides mechanisms to federate campus-level and national-level data storage systems, including inherent data replication and migration mechanisms, as well as the capability to import existing data storage systems. The DSC is currently being used to seamlessly replicate all data associated with Galaxy Main. This integration will enable analyses submitted through the Galaxy main site to run transparently on PSC's large memory systems, as well as other XSEDE systems, providing capabilities beyond what is available via commercial cloud computing providers. For example, using PSC resources, researchers recently completed the de novo assembly of over 20 primate transcriptomes to create a non-human primate reference transcriptome resource, with each transcriptome using 1.8 billion reads of RNA-sequence data. Another group completed the assembly of a soil metagenome requiring 3.5 terabytes of RAM.</p>
<fig id="AMIAJNL2013002059F2" position="float">
<label>Figure 2</label>
<caption>
<p>Users and number of compute jobs submitted through the Galaxy Main web-portal by month.</p>
</caption>
<graphic xlink:href="amiajnl-2013-002059f02"></graphic>
</fig>
<p>The tight integration of web platforms like Galaxy with PSC and XSEDE, together with complementary efforts and services provided by organizations like iPlant and NCGAS, brings these cutting-edge capabilities to a much larger group of researchers.</p>
</sec>
<sec id="s3">
<title>Cyberinfrastructure</title>
<p>XSEDE is a five-year $121 million project supported by NSF which extends the work of the previous TeraGrid program to integrate and simplify access to CI resources and services. XSEDE currently supports 15 high-performance supercomputers or computational clusters, and well over 100 petabytes of storage located at 17 partner institutions.
<xref rid="R2" ref-type="bibr">2</xref>
More than 8000 scientists use XSEDE to complete thousands of research projects and generate more than 2000 publications annually. XSEDE works to simplify allocations and cybersecurity across supported systems, while maintaining the highest level of professional service. It also provides Extended Collaborative Support Services to help users develop projects that take advantage of XSEDE resources, and supports campus champions from over 135 different institutions who help researchers use XSEDE resources. Access to XSEDE resources is through an allocation process. Small start-up allocations can be approved within 48 h and give investigators resources to determine the feasibility of larger individual projects. Full research allocations are awarded quarterly, and large initiatives, such as NCGAS and iPlant, receive allocations sizable enough to support the computational needs of the communities they serve. To determine if your institution has an XSEDE campus champion, visit
<ext-link ext-link-type="uri" xlink:href="https://www.xsede.org/campus-champions">https://www.xsede.org/campus-champions</ext-link>
; otherwise researchers can request XSEDE allocations at
<ext-link ext-link-type="uri" xlink:href="https://www.xsede.org">https://www.xsede.org</ext-link>
.</p>
<p>The OSG provides an alternative for accessing the national cyberinfrastructure. Jointly funded by the NSF and the Department of Energy, the OSG enables distributed high-throughput computing for users at all scales by working with organizations to federate their computational resources. The OSG is built around the concept of virtual organizations (VO), each of which contributes capacity to other VOs, thus increasing the overall computational capacity for all projects by scavenging cycles from otherwise idle systems. This shared architecture is particularly well suited for tasks that can run on many single processors, such as BLAST. By utilizing the OSG's middleware layers the VOs can set policies to share their resources with other VOs. The 116 sites currently on the OSG submitted between 12 and 15 million jobs per month within the last year, representing 50 million CPU hours and over 30 petabytes of data transfer per month.
<xref rid="R12" ref-type="bibr">12</xref>
Researchers interested in the OSG can visit
<ext-link ext-link-type="uri" xlink:href="https://www.opensciencegrid.org/">https://www.opensciencegrid.org/</ext-link>
.</p>
<p>The computational assets of the national CI are connected to each other by 10- to 100-Gbps networks. A 100-Gbps network connection allows files to move across the country faster than they can be loaded into RAM locally. For many applications file movement can be done without any awareness of the underlying network. Internet2 provides infrastructure across more than 220 institutions, 60 corporations, 70 governmental agencies, 38 regional and state networks, and 65 national research and educational networks from over 100 countries. The organization provides advanced network services that are tailored to the needs of researchers. They have network monitoring and diagnostic research networks, and they ensure high throughput data streaming not available on commodity internet connections.</p>
<p>Just as domestic networks have extended their reach and speed, so have international networks. The NSF funded International Research Network Connections (IRNC) program provides high-performance network connectivity from the USA to Europe, Asia, Central and South America, and a number of other locations.
<xref rid="R13" ref-type="bibr">13</xref>
Of course, not all international research and education (R/E) connectivity is provided by the USA. National research and educational networks in Asia and Central and South America provide additional network connections to the global R/E network fabric. The pan-European network GEANT provides connections within Europe, from Europe to the USA and from Europe to a number of other countries, ranging from Kyrgyzstan to Cambodia to China.</p>
<p>The network connections illustrated in
<xref ref-type="fig" rid="AMIAJNL2013002059F3">figure 3</xref>
are generally 10-Gbps connections. The next generation of international networking is set to launch in mid-2013 with the implementation of 100-Gbps connectivity between the USA and Europe. This will be followed by additional 100-Gbps connections to Asia and South and Central America within the next year.</p>
<fig id="AMIAJNL2013002059F3" position="float">
<label>Figure 3</label>
<caption>
<p>Schematic representation of current international data network connectivity available for scientific research.</p>
</caption>
<graphic xlink:href="amiajnl-2013-002059f03"></graphic>
</fig>
<p>As an illustration of the power of this international connectivity, it is possible today for a researcher to easily transfer a large dataset from Indiana University to Tsinghua University in China using NSF and China supported connectivity. The same dataset could then be transferred from Tsinghua to University College in London using China and European supplied connectivity. And again, this dataset could be transferred from University College to Cornell University in the USA using GEANT, NSF, and Internet2 supplied connectivity, making a smooth 10-G enabled trip around the globe.</p>
</sec>
<sec sec-type="conclusions" id="s4">
<title>Conclusion</title>
<p>Consider that if there were 30 000 deployed next-generation sequencers, FT-ICR mass spectrometers, and NMR instruments across the country dedicated to biomedical research, and that each of these produced 100 GB/day of data (on average), then these instruments would produce around 1096 PB of data per year. This is a large amount of data! But, it is still less than the 2000 PB/year expected to be generated from Phase 1 of the Square Kilometer Array
<xref rid="R14" ref-type="bibr">14</xref>
starting in 2013. What makes the biomedical informatics problem unique is the distributed nature of the data. It comes from many instruments, each used in highly unique ways, and the resulting data is analyzed with many different workflows—each workflow presenting its own computational challenge. To meet these challenges, tailored biomedical informatics solutions can and are being built on existing national CI, leveraging its scale to allow biomedical researchers to address ‘
<italic>Big Data</italic>
’ problems. As biomedical research increasingly pursues genomic characterization of patients and diseases, for example, resources at the scale of the national CI will be a critical component of future disease research. Further, as new paradigms emerge, for example fixed data repositories with programmable interfaces that allow workflows to move to data, the staff and scientists of the national CI have the experience to make these systems possible.</p>
</sec>
</body>
<back>
<fn-group>
<fn>
<p>
<bold>Contributors:</bold>
RL manages the National Center for Genome Analysis Support; MV manages Life Science Computing at the Texas Advanced Computing Center, and JMF is a Research Associate there; MS is the Associate Director for Health Sciences at Internet2; JGW is the Director of International Networking for Indiana University; PDB is a Senior Scientific Specialist at the Pittsburgh Supercomputing Center; JT is the Principal Investigator for Galaxy; WKB is the Director of the National Center for Genome Analysis Support and Principal Investigator for the Open Science Grids Grid Operations Center. All authors have contributed equally to formulating the perspective presented in this manuscript.</p>
</fn>
<fn>
<p>
<bold>Funding:</bold>
This research is based on work supported by the National Science Foundation under grant no. ABI-1062432 (Craig Stewart, PI) and 1242759 PHY to Indiana University. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, the National Center for Genome Analysis Support, or Indiana University. The iPlant Collaborative is funded by a grant from the National Science Foundation (#DBI-0735191).</p>
</fn>
<fn>
<p>
<bold>Competing interests:</bold>
None.</p>
</fn>
<fn>
<p>
<bold>Provenance and peer review:</bold>
Not commissioned; externally peer reviewed.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="R1">
<label>1</label>
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Stewart</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Simms</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Plale</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>What is cyberinfrastructure</article-title>
.
<conf-name>In Proceedings of the 38th annual ACM SIGUCCS fall conference (SIGUCCS ‘10)</conf-name>
<publisher-loc>New York, NY, USA</publisher-loc>
:
<publisher-name>ACM</publisher-name>
,
<fpage>37</fpage>
<lpage>44</lpage>
</mixed-citation>
</ref>
<ref id="R2">
<label>2</label>
<mixed-citation publication-type="other">
<comment>
<ext-link ext-link-type="uri" xlink:href="https://www.xsede.org/web/guest/overview">https://www.xsede.org/web/guest/overview</ext-link>
(accessed May 2013</comment>
).</mixed-citation>
</ref>
<ref id="R3">
<label>3</label>
<mixed-citation publication-type="other">
<comment>
<ext-link ext-link-type="uri" xlink:href="https://opensciencegrid.org/bin/view">https://opensciencegrid.org/bin/view</ext-link>
(accessed May 2013</comment>
).</mixed-citation>
</ref>
<ref id="R4">
<label>4</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goff</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Vaughn</surname>
<given-names>M</given-names>
</name>
<name>
<surname>McKay</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The iPlant collaborative: cyberinfrastructure for plant biology</article-title>
.
<source>Front Plant Sci</source>
<year>2011</year>
;
<volume>2</volume>
:
<fpage>00034</fpage>
</mixed-citation>
</ref>
<ref id="R5">
<label>5</label>
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>LeDuc</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>L-S</given-names>
</name>
<name>
<surname>Ganote</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>National Center for Genome Analysis Support Leverages XSEDE to Support Life Science Research</article-title>
.
<conf-name>In Proceedings of the 2nd Conference of the Extreme Science and Engineering Discovery Environment, XSEDE'13</conf-name>
,
<comment>July</comment>
<year>2013</year>
,
<comment>in press</comment>
</mixed-citation>
</ref>
<ref id="R6">
<label>6</label>
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Lenards</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Merchant</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Stanzione</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Building an environment to facilitate discoveries for plant sciences</article-title>
.
<conf-name>In Proceedings of the 2011 ACM workshop on Gateway computing environments (GCE ‘11)</conf-name>
<publisher-loc>New York, NY, USA</publisher-loc>
:
<publisher-name>ACM</publisher-name>
,
<fpage>51</fpage>
<lpage>8</lpage>
</mixed-citation>
</ref>
<ref id="R7">
<label>7</label>
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Skidmore</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>S-J</given-names>
</name>
<name>
<surname>Kuchimanchi</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>iPlant atmosphere: a gateway to cloud infrastructure for the plant sciences</article-title>
.
<conf-name>In Proceedings of the 2011 ACM workshop on Gateway computing environments (GCE ‘11)</conf-name>
<publisher-loc>New York, NY, USA</publisher-loc>
:
<publisher-name>ACM</publisher-name>
,
<fpage>59</fpage>
<lpage>64</lpage>
</mixed-citation>
</ref>
<ref id="R8">
<label>8</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goecks</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Nekrutenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences</article-title>
.
<source>Genome Biol</source>
<year>2010</year>
;
<volume>11</volume>
:
<fpage>R86</fpage>
<pub-id pub-id-type="pmid">20738864</pub-id>
</mixed-citation>
</ref>
<ref id="R9">
<label>9</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blankenberg</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kuster</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Coraor</surname>
<given-names>N</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Galaxy: a web-based genome analysis tool for experimentalists</article-title>
.
<source>In Curr Protoc Mol Biol</source>
<year>Supplement 89, 2010</year>
;
<comment>Chapter 19: Unit 19.10.1–21</comment>
</mixed-citation>
</ref>
<ref id="R10">
<label>10</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Giardine</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Riemer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hardison</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Galaxy: a platform for interactive large-scale genome analysis</article-title>
.
<source>Genome Res</source>
<year>2005</year>
;
<volume>15</volume>
:
<fpage>1451</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="pmid">16169926</pub-id>
</mixed-citation>
</ref>
<ref id="R11">
<label>11</label>
<mixed-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Nowoczynski</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Sommerfield</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Yanovich</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<conf-name>The Data Supercell. In Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond (XSEDE ‘12)</conf-name>
<publisher-loc>New York, NY, USA</publisher-loc>
:
<publisher-name>ACM</publisher-name>
,
<comment>Article 13,11 pages</comment>
</mixed-citation>
</ref>
<ref id="R12">
<label>12</label>
<mixed-citation publication-type="other">
<comment>
<ext-link ext-link-type="uri" xlink:href="http://display.grid.iu.edu/">http://display.grid.iu.edu/</ext-link>
(accessed May 2013</comment>
).</mixed-citation>
</ref>
<ref id="R13">
<label>13</label>
<mixed-citation publication-type="other">
<comment>
<ext-link ext-link-type="uri" xlink:href="http://irnclinks.net/">http://irnclinks.net/</ext-link>
(accessed May 2013</comment>
).</mixed-citation>
</ref>
<ref id="R14">
<label>14</label>
<mixed-citation publication-type="other">
<comment>
<ext-link ext-link-type="uri" xlink:href="http://www.skatelescope.org/uploaded/21705_130_Memo_Dewdney.pdf">http://www.skatelescope.org/uploaded/21705_130_Memo_Dewdney.pdf</ext-link>
(accessed May 2013</comment>
).</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000549 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000549 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3932465
   |texte=   Leveraging the national cyberinfrastructure for biomedical research
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:23964072" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024