CyberinfraV1, Pmc, Corpus, bibRecord, 000484

Hybrid cloud and cluster computing paradigms for life science applications

Identifieur interne : 000484 ( Pmc/Corpus ); précédent : 000483; suivant : 000485

Hybrid cloud and cluster computing paradigms for life science applications

Auteurs : Judy Qiu ; Jaliya Ekanayake ; Thilina Gunarathne ; Jong Youl Choi ; Seung-Hee Bae ; Hui Li ; Bingjing Zhang ; Tak-Lon Wu ; Yang Ruan ; Saliya Ekanayake ; Adam Hughes ; Geoffrey Fox

Source :

BMC Bioinformatics [ 1471-2105 ] ; 2010.

RBID : PMC:3040529

Abstract

Background

Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister.

Results

Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications.

Conclusions

The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications.

Methods

We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040529

DOI: 10.1186/1471-2105-11-S12-S3
PubMed: 21210982
PubMed Central: 3040529

Links to Exploration step

PMC:3040529

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Hybrid cloud and cluster computing paradigms for life science applications</title>
<author><name sortKey="Qiu, Judy" sort="Qiu, Judy" uniqKey="Qiu J" first="Judy" last="Qiu">Judy Qiu</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Ekanayake, Jaliya" sort="Ekanayake, Jaliya" uniqKey="Ekanayake J" first="Jaliya" last="Ekanayake">Jaliya Ekanayake</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Gunarathne, Thilina" sort="Gunarathne, Thilina" uniqKey="Gunarathne T" first="Thilina" last="Gunarathne">Thilina Gunarathne</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Choi, Jong Youl" sort="Choi, Jong Youl" uniqKey="Choi J" first="Jong Youl" last="Choi">Jong Youl Choi</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Bae, Seung Hee" sort="Bae, Seung Hee" uniqKey="Bae S" first="Seung-Hee" last="Bae">Seung-Hee Bae</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Li, Hui" sort="Li, Hui" uniqKey="Li H" first="Hui" last="Li">Hui Li</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Zhang, Bingjing" sort="Zhang, Bingjing" uniqKey="Zhang B" first="Bingjing" last="Zhang">Bingjing Zhang</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Wu, Tak Lon" sort="Wu, Tak Lon" uniqKey="Wu T" first="Tak-Lon" last="Wu">Tak-Lon Wu</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Ruan, Yang" sort="Ruan, Yang" uniqKey="Ruan Y" first="Yang" last="Ruan">Yang Ruan</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Ekanayake, Saliya" sort="Ekanayake, Saliya" uniqKey="Ekanayake S" first="Saliya" last="Ekanayake">Saliya Ekanayake</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Hughes, Adam" sort="Hughes, Adam" uniqKey="Hughes A" first="Adam" last="Hughes">Adam Hughes</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Fox, Geoffrey" sort="Fox, Geoffrey" uniqKey="Fox G" first="Geoffrey" last="Fox">Geoffrey Fox</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">21210982</idno>
<idno type="pmc">3040529</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040529</idno>
<idno type="RBID">PMC:3040529</idno>
<idno type="doi">10.1186/1471-2105-11-S12-S3</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000484</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Hybrid cloud and cluster computing paradigms for life science applications</title>
<author><name sortKey="Qiu, Judy" sort="Qiu, Judy" uniqKey="Qiu J" first="Judy" last="Qiu">Judy Qiu</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Ekanayake, Jaliya" sort="Ekanayake, Jaliya" uniqKey="Ekanayake J" first="Jaliya" last="Ekanayake">Jaliya Ekanayake</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Gunarathne, Thilina" sort="Gunarathne, Thilina" uniqKey="Gunarathne T" first="Thilina" last="Gunarathne">Thilina Gunarathne</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Choi, Jong Youl" sort="Choi, Jong Youl" uniqKey="Choi J" first="Jong Youl" last="Choi">Jong Youl Choi</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Bae, Seung Hee" sort="Bae, Seung Hee" uniqKey="Bae S" first="Seung-Hee" last="Bae">Seung-Hee Bae</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Li, Hui" sort="Li, Hui" uniqKey="Li H" first="Hui" last="Li">Hui Li</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Zhang, Bingjing" sort="Zhang, Bingjing" uniqKey="Zhang B" first="Bingjing" last="Zhang">Bingjing Zhang</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Wu, Tak Lon" sort="Wu, Tak Lon" uniqKey="Wu T" first="Tak-Lon" last="Wu">Tak-Lon Wu</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Ruan, Yang" sort="Ruan, Yang" uniqKey="Ruan Y" first="Yang" last="Ruan">Yang Ruan</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Ekanayake, Saliya" sort="Ekanayake, Saliya" uniqKey="Ekanayake S" first="Saliya" last="Ekanayake">Saliya Ekanayake</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Hughes, Adam" sort="Hughes, Adam" uniqKey="Hughes A" first="Adam" last="Hughes">Adam Hughes</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Fox, Geoffrey" sort="Fox, Geoffrey" uniqKey="Fox G" first="Geoffrey" last="Fox">Geoffrey Fox</name>
<affiliation><nlm:aff id="I1">School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister.</p>
</sec>
<sec><title>Results</title>
<p>Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications.</p>
</sec>
<sec><title>Conclusions</title>
<p>The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications.</p>
</sec>
<sec><title>Methods</title>
<p>We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Armbrust, M" uniqKey="Armbrust M">M Armbrust</name>
</author>
<author><name sortKey="Fox" uniqKey="Fox">Fox</name>
</author>
<author><name sortKey="Griffith, R" uniqKey="Griffith R">R Griffith</name>
</author>
<author><name sortKey="Joseph, Ad" uniqKey="Joseph A">AD Joseph</name>
</author>
<author><name sortKey="Katz, R" uniqKey="Katz R">R Katz</name>
</author>
<author><name sortKey="Konwinski, A" uniqKey="Konwinski A">A Konwinski</name>
</author>
<author><name sortKey="Lee, G" uniqKey="Lee G">G Lee</name>
</author>
<author><name sortKey="Patterson, D" uniqKey="Patterson D">D Patterson</name>
</author>
<author><name sortKey="Rabkin, A" uniqKey="Rabkin A">A Rabkin</name>
</author>
<author><name sortKey="Stoica, I" uniqKey="Stoica I">I Stoica</name>
</author>
<author><name sortKey="Zaharia, M" uniqKey="Zaharia M">M Zaharia</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Walker, E" uniqKey="Walker E">E Walker</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ekanayake, J" uniqKey="Ekanayake J">J Ekanayake</name>
</author>
<author><name sortKey="Qiu, Xh" uniqKey="Qiu X">XH Qiu</name>
</author>
<author><name sortKey="Gunarathne, T" uniqKey="Gunarathne T">T Gunarathne</name>
</author>
<author><name sortKey="Beason, S" uniqKey="Beason S">S Beason</name>
</author>
<author><name sortKey="Fox, G" uniqKey="Fox G">G Fox</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Evangelinos, C" uniqKey="Evangelinos C">C Evangelinos</name>
</author>
<author><name sortKey="Hill, Cn" uniqKey="Hill C">CN Hill</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lange, J" uniqKey="Lange J">J Lange</name>
</author>
<author><name sortKey="Pedretti, K" uniqKey="Pedretti K">K Pedretti</name>
</author>
<author><name sortKey="Hudson, T" uniqKey="Hudson T">T Hudson</name>
</author>
<author><name sortKey="Dinda, P" uniqKey="Dinda P">P Dinda</name>
</author>
<author><name sortKey="Cui, Z" uniqKey="Cui Z">Z Cui</name>
</author>
<author><name sortKey="Xia, L" uniqKey="Xia L">L Xia</name>
</author>
<author><name sortKey="Bridges, P" uniqKey="Bridges P">P Bridges</name>
</author>
<author><name sortKey="Gocke, A" uniqKey="Gocke A">A Gocke</name>
</author>
<author><name sortKey="Laconette, S" uniqKey="Laconette S">S laconette</name>
</author>
<author><name sortKey="Levenhagen, M" uniqKey="Levenhagen M">M Levenhagen</name>
</author>
<author><name sortKey="Brightwell, R" uniqKey="Brightwell R">R Brightwell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fox, G" uniqKey="Fox G">G Fox</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dean, J" uniqKey="Dean J">J Dean</name>
</author>
<author><name sortKey="Ghemawat, S" uniqKey="Ghemawat S">S Ghemawat</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Ekanayake, J" uniqKey="Ekanayake J">J Ekanayake</name>
</author>
<author><name sortKey="Gunarathne, T" uniqKey="Gunarathne T">T Gunarathne</name>
</author>
<author><name sortKey="Qiu, J" uniqKey="Qiu J">J Qiu</name>
</author>
<author><name sortKey="Fox, G" uniqKey="Fox G">G Fox</name>
</author>
<author><name sortKey="Beason, S" uniqKey="Beason S">S Beason</name>
</author>
<author><name sortKey="Choi, Jy" uniqKey="Choi J">JY Choi</name>
</author>
<author><name sortKey="Ruan, Y" uniqKey="Ruan Y">Y Ruan</name>
</author>
<author><name sortKey="Bae, Sh" uniqKey="Bae S">SH Bae</name>
</author>
<author><name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ekanayake, J" uniqKey="Ekanayake J">J Ekanayake</name>
</author>
<author><name sortKey="Balkir, A" uniqKey="Balkir A">A Balkir</name>
</author>
<author><name sortKey="Gunarathne, T" uniqKey="Gunarathne T">T Gunarathne</name>
</author>
<author><name sortKey="Fox, G" uniqKey="Fox G">G Fox</name>
</author>
<author><name sortKey="Poulain, C" uniqKey="Poulain C">C Poulain</name>
</author>
<author><name sortKey="Araujo, N" uniqKey="Araujo N">N Araujo</name>
</author>
<author><name sortKey="Barga, R" uniqKey="Barga R">R Barga</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fox, Gc" uniqKey="Fox G">GC Fox</name>
</author>
<author><name sortKey="Qiu, Xh" uniqKey="Qiu X">XH Qiu</name>
</author>
<author><name sortKey="Beason, S" uniqKey="Beason S">S Beason</name>
</author>
<author><name sortKey="Choi, Jy" uniqKey="Choi J">JY Choi</name>
</author>
<author><name sortKey="Rho, M" uniqKey="Rho M">M Rho</name>
</author>
<author><name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author><name sortKey="Devadasan, N" uniqKey="Devadasan N">N Devadasan</name>
</author>
<author><name sortKey="Liu, G" uniqKey="Liu G">G Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bae, Sh" uniqKey="Bae S">SH Bae</name>
</author>
<author><name sortKey="Choi, Jh" uniqKey="Choi J">JH Choi</name>
</author>
<author><name sortKey="Qiu, J" uniqKey="Qiu J">J Qiu</name>
</author>
<author><name sortKey="Fox, J" uniqKey="Fox J">J Fox</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sammon, Jw" uniqKey="Sammon J">JW Sammon</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ekanayake, J" uniqKey="Ekanayake J">J Ekanayake</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fox, Gc" uniqKey="Fox G">GC Fox</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fox, Gc" uniqKey="Fox G">GC Fox</name>
</author>
<author><name sortKey="Williams, Rd" uniqKey="Williams R">RD Williams</name>
</author>
<author><name sortKey="Messina, Pc" uniqKey="Messina P">PC Messina</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chu" uniqKey="Chu">Chu</name>
</author>
<author><name sortKey="Cheng, T" uniqKey="Cheng T">T Cheng</name>
</author>
<author><name sortKey="Sang, Kim K" uniqKey="Sang K">Kim K Sang</name>
</author>
<author><name sortKey="Yi, Lin A" uniqKey="Yi L">Lin A Yi</name>
</author>
<author><name sortKey="Yu, Y" uniqKey="Yu Y">Y Yu</name>
</author>
<author><name sortKey="Bradski, R" uniqKey="Bradski R">R Bradski</name>
</author>
<author><name sortKey="Ng, A" uniqKey="Ng A">A Ng</name>
</author>
<author><name sortKey="Olukotun, K" uniqKey="Olukotun K">K Olukotun</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Qiu, X" uniqKey="Qiu X">X Qiu</name>
</author>
<author><name sortKey="Ekanayake, J" uniqKey="Ekanayake J">J Ekanayake</name>
</author>
<author><name sortKey="Beason, S" uniqKey="Beason S">S Beason</name>
</author>
<author><name sortKey="Gunarathne, T" uniqKey="Gunarathne T">T Gunarathne</name>
</author>
<author><name sortKey="Fox, G" uniqKey="Fox G">G Fox</name>
</author>
<author><name sortKey="Barga, R" uniqKey="Barga R">R Barga</name>
</author>
<author><name sortKey="Gannon, D" uniqKey="Gannon D">D Gannon</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Ekanayake, J" uniqKey="Ekanayake J">J Ekanayake</name>
</author>
<author><name sortKey="Li, Hui" uniqKey="Li H">Hui Li</name>
</author>
<author><name sortKey="Bingjing, B Zhang" uniqKey="Bingjing B">B Zhang Bingjing</name>
</author>
<author><name sortKey="Gunarathne, T" uniqKey="Gunarathne T">T Gunarathne</name>
</author>
<author><name sortKey="Bae, Sh" uniqKey="Bae S">SH Bae</name>
</author>
<author><name sortKey="Qiu, J" uniqKey="Qiu J">J Qiu</name>
</author>
<author><name sortKey="Fox, G" uniqKey="Fox G">G Fox</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gunarathne, T" uniqKey="Gunarathne T">T Gunarathne</name>
</author>
<author><name sortKey="Wu, Tl" uniqKey="Wu T">TL Wu</name>
</author>
<author><name sortKey="Qiu, J" uniqKey="Qiu J">J Qiu</name>
</author>
<author><name sortKey="Fox, G" uniqKey="Fox G">G Fox</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-title-group><journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">21210982</article-id>
<article-id pub-id-type="pmc">3040529</article-id>
<article-id pub-id-type="publisher-id">1471-2105-11-S12-S3</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-11-S12-S3</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Proceedings</subject>
</subj-group>
</article-categories>
<title-group><article-title>Hybrid cloud and cluster computing paradigms for life science applications</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" corresp="yes" id="A1"><name><surname>Qiu</surname>
<given-names>Judy</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>xqiu@indiana.edu</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A2"><name><surname>Ekanayake</surname>
<given-names>Jaliya</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>jekanaya@indiana.edu</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A3"><name><surname>Gunarathne</surname>
<given-names>Thilina</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>tgunarat@indiana.edu</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A4"><name><surname>Choi</surname>
<given-names>Jong Youl</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>jychoi@indiana.edu</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A5"><name><surname>Bae</surname>
<given-names>Seung-Hee</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>sebae@indiana.edu</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A6"><name><surname>Li</surname>
<given-names>Hui</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>lihui@indiana.edu</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A7"><name><surname>Zhang</surname>
<given-names>Bingjing</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>zhangbj@indiana.edu</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A8"><name><surname>Wu</surname>
<given-names>Tak-Lon</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>taklwu@indiana.edu</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A9"><name><surname>Ruan</surname>
<given-names>Yang</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>yangruan@indiana.edu</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A10"><name><surname>Ekanayake</surname>
<given-names>Saliya</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>sekanaya@indiana.edu</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A11"><name><surname>Hughes</surname>
<given-names>Adam</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>adalhugh@indiana.edu</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A12"><name><surname>Fox</surname>
<given-names>Geoffrey</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>gcf@indiana.edu</email>
</contrib>
</contrib-group>
<aff id="I1"><label>1</label>
School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA</aff>
<aff id="I2"><label>2</label>
Pervasive Technology Institute, Indiana University, Bloomington, IN 47408, USA</aff>
<pub-date pub-type="collection"><year>2010</year>
</pub-date>
<pub-date pub-type="epub"><day>21</day>
<month>12</month>
<year>2010</year>
</pub-date>
<volume>11</volume>
<issue>Suppl 12</issue>
<supplement><named-content content-type="supplement-title">Proceedings of the 11th Annual Bioinformatics Open Source Conference (BOSC) 2010</named-content>
<named-content content-type="supplement-editor">Kam D Dahlquist</named-content>
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/content/pdf/1471-2105-11-S12-info.pdf">http://www.biomedcentral.com/content/pdf/1471-2105-11-S12-info.pdf</ext-link>
</supplement>
<fpage>S3</fpage>
<lpage>S3</lpage>
<permissions><copyright-statement>Copyright ©2010 Qiu et al; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2010</copyright-year>
<copyright-holder>Qiu et al; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0"><license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2105/11/S12/S3"></self-uri>
<abstract><sec><title>Background</title>
<p>Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister.</p>
</sec>
<sec><title>Results</title>
<p>Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications.</p>
</sec>
<sec><title>Conclusions</title>
<p>The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications.</p>
</sec>
<sec><title>Methods</title>
<p>We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.</p>
</sec>
</abstract>
<conference><conf-date>9–10 July 2010</conf-date>
<conf-name>The 11th Annual Bioinformatics Open Source Conference (BOSC) 2010</conf-name>
<conf-loc>Boston, MA, USA</conf-loc>
</conference>
</article-meta>
</front>
<body><sec><title>Background</title>
<p>Cloud computing [<xref ref-type="bibr" rid="B1">1</xref>
] is at the peak of the Gartner technology hype curve [<xref ref-type="bibr" rid="B2">2</xref>
], but there are good reasons to believe that it is for real and will be important for large scale scientific computing:</p>
<p>1) Clouds are the largest scale computer centers constructed, and so they have the capacity to be important to large-scale science problems as well as those at small scale.</p>
<p>2) Clouds exploit the economies of this scale and so can be expected to be a cost effective approach to computing. Their architecture explicitly addresses the important fault tolerance issue.</p>
<p>3) Clouds are commercially supported and so one can expect reasonably robust software without the sustainability difficulties seen from the academic software systems critical to much current cyberinfrastructure.</p>
<p>4) There are 3 major vendors of clouds (Amazon, Google, and Microsoft) and many other infrastructure and software cloud technology vendors including Eucalyptus Systems, which spun off from UC Santa Barbara HPC research. This competition should ensure that clouds develop in a healthy, innovative fashion. Further attention is already being given to cloud standards [<xref ref-type="bibr" rid="B3">3</xref>
].</p>
<p>5) There are many cloud research efforts, conferences, and other activities including Nimbus [<xref ref-type="bibr" rid="B4">4</xref>
], OpenNebula [<xref ref-type="bibr" rid="B5">5</xref>
], Sector/Sphere [<xref ref-type="bibr" rid="B6">6</xref>
], and Eucalyptus [<xref ref-type="bibr" rid="B7">7</xref>
].</p>
<p>6) There are a growing number of academic and science cloud systems supporting users through NSF Programs for Google/IBM and Microsoft Azure systems. In NSF OCI, FutureGrid [<xref ref-type="bibr" rid="B8">8</xref>
] offers a cloud testbed, and Magellan [<xref ref-type="bibr" rid="B9">9</xref>
] is a major DoE experimental cloud system. The EU framework 7 project VENUS-C [<xref ref-type="bibr" rid="B10">10</xref>
] is just starting with an emphasis on Azure.</p>
<p>7) Clouds offer attractive "on-demand" elastic and interactive computing.</p>
<p>Much scientific computing can be performed on clouds [<xref ref-type="bibr" rid="B11">11</xref>
], but there are some well-documented problems with using clouds, including:</p>
<p>1) The centralized computing model for clouds runs counter to the principle of "bringing the computing to the data", and bringing the "data to a commercial cloud facility" may be slow and expensive.</p>
<p>2) There are many security, legal, and privacy issues [<xref ref-type="bibr" rid="B12">12</xref>
] that often mimic those of the Internet which are especially problematic in areas such health informatics.</p>
<p>3) The virtualized networking currently used in the virtual machines (VM) in today’s commercial clouds and jitter from complex operating system functions increases synchronization/communication costs. This is especially serious in large-scale parallel computing and leads to significant overheads in many MPI applications [<xref ref-type="bibr" rid="B13">13</xref>
-<xref ref-type="bibr" rid="B15">15</xref>
]. Indeed, the usual (and attractive) fault tolerance model for clouds runs counter to the tight synchronization needed in most MPI applications. Specialized VMs and operating systems can give excellent MPI performance [<xref ref-type="bibr" rid="B16">16</xref>
] but we will consider commodity approaches here. Amazon has just announced Cluster Compute instances in this area.</p>
<p>4) Private clouds do not currently offer the rich platform features seen on commercial clouds [<xref ref-type="bibr" rid="B17">17</xref>
].</p>
<p>Some of these issues can be addressed with customized (private) clouds and enhanced bandwidth from research systems like TeraGrid to commercial cloud networks. However it seems likely that clouds will not supplant traditional approaches for very large-scale parallel (MPI) jobs in the near future. Thus we consider a hybrid model with jobs running on classic HPC systems, clouds, or both as workflows could link HPC and cloud systems. Commercial clouds support "massively parallel" or “many tasks” applications, but only those that are loosely coupled and so insensitive to higher synchronization costs. We focus on the MapReduce programming model [<xref ref-type="bibr" rid="B18">18</xref>
], which can be implemented on any cluster using the open source Hadoop [<xref ref-type="bibr" rid="B19">19</xref>
] software for Linux or the Microsoft Dryad system [<xref ref-type="bibr" rid="B20">20</xref>
,<xref ref-type="bibr" rid="B21">21</xref>
] for Windows. MapReduce is currently available on Amazon systems, and we have developed a prototype MapReduce for Azure.</p>
<sec><title>Results</title>
<sec><title>Metagenomics - a data intensive application vignette</title>
<p>The study of microbial genomes is complicated by the fact that only small number of species can be isolated successfully and the current way forward is metagenomic studies of culture-independent, collective sets of genomes in their natural environments. This requires identification of as many as millions of genes and thousands of species from individual samples. New sequencing technology can provide the required data samples with a throughput of 1 trillion base pairs per day and this rate will increase. A typical observation and data pipeline [<xref ref-type="bibr" rid="B22">22</xref>
] is shown in Figure <xref ref-type="fig" rid="F1">1</xref>
 with sequencers producing DNA samples that are assembled and subject to further analysis including BLAST-like comparison with existing datasets as well as clustering and visualization to identify new gene families. Figure <xref ref-type="fig" rid="F2">2</xref>
 shows initial results from analysis of 30,000 sequences with clusters identified and visualized using dimension reduction to map to three dimensions with Multi-dimensional scaling MDS [<xref ref-type="bibr" rid="B23">23</xref>
]. The initial parts of the pipeline fit the MapReduce or many-task Cloud model but the latter stages involve parallel linear algebra.</p>
<fig id="F1" position="float"><label>Figure 1</label>
<caption><p>Pipeline for analysis of metagenomics Data</p>
</caption>
<graphic xlink:href="1471-2105-11-S12-S3-1"></graphic>
</fig>
<fig id="F2" position="float"><label>Figure 2</label>
<caption><p>Results of 17 clusters for full sample using Sammon’s version of MDS for visualization [<xref ref-type="bibr" rid="B24">24</xref>
]</p>
</caption>
<graphic xlink:href="1471-2105-11-S12-S3-2"></graphic>
</fig>
<p>State of the art MDS and clustering algorithms scale like O(N<sup>2</sup>
) for N sequences; the total runtime for MDS and clustering is about 2 hours each on a 768 core commodity cluster obtaining a speedup of about 500 using a hybrid MPI-threading implementation on 24 core nodes. The initial steps can be run on clouds and include the calculation of a distance matrix of N(N-1)/2 independent elements. Million sequence problems of this type will challenge the largest clouds and the largest TeraGrid resources. Figure <xref ref-type="fig" rid="F3">3</xref>
 looks at a related sequence assembly problem and compares performance of MapReduce (Hadoop, DryadLINQ) with and without virtual machines and the basic Amazon and Microsoft clouds. The execution times are similar (range is 30%) showing that this class of algorithm can be effectively run on many different infrastructures and it makes sense to consider the intrinsic advantages of clouds described above. In recent work we have looked hierarchical methods to reduce O(N<sup>2</sup>
 ) execution time to O(NlogN) or O(N) and allow loosely-coupled cloud implementation with initial results on interpolation methods presented in [<xref ref-type="bibr" rid="B23">23</xref>
].</p>
<fig id="F3" position="float"><label>Figure 3</label>
<caption><p>Time to process a single biology sequence file (458 reads) per core with different frameworks[<xref ref-type="bibr" rid="B24">24</xref>
]</p>
</caption>
<graphic xlink:href="1471-2105-11-S12-S3-3"></graphic>
</fig>
<p>One can study in [<xref ref-type="bibr" rid="B22">22</xref>
,<xref ref-type="bibr" rid="B25">25</xref>
,<xref ref-type="bibr" rid="B26">26</xref>
] which applications run well on MapReduce and relate this to an old classification of Fox [<xref ref-type="bibr" rid="B27">27</xref>
]. One finds that Pleasingly Parallel and a subset of what was called “Loosely Synchronous” applications run on MapReduce. However, current MapReduce addresses problems with only a single (or a “few”) MapReduce iterations, whereas there are a large set of data parallel applications that involve many iterations and are not suitable for basic MapReduce. Such iterative algorithms include linear algebra and many data mining algorithms [<xref ref-type="bibr" rid="B28">28</xref>
], and here we introduce the open source Twister to address these problems. Twister [<xref ref-type="bibr" rid="B25">25</xref>
,<xref ref-type="bibr" rid="B29">29</xref>
] supports applications needing either a few iterations or many iterations using a subset of MPI - reduction and broadcast operations and not the latency sensitive MPI point-to-point operations.</p>
<p>Twister [<xref ref-type="bibr" rid="B29">29</xref>
] supports iterative computations of the type needed in clustering and MDS [<xref ref-type="bibr" rid="B23">23</xref>
]. This programming paradigm is attractive as Twister supports all phases of the pipeline in Figure <xref ref-type="fig" rid="F1">1</xref>
 with performance that is better or comparable to the basic MapReduce and on large enough problems similar to MPI for the iterative cases where basic MapReduce is inadequate. The current Twister system is just a prototype and further research will focus on scalability and fault tolerance. The key idea is to combine the fault tolerance and flexibility of MapReduce with the performance of MPI.</p>
<p>The current Twister, shown in Figure <xref ref-type="fig" rid="F4">4</xref>
, is a distributed in-memory MapReduce runtime optimized for iterative MapReduce computations. It reads data from local disks of the worker nodes and handles the intermediate data in distributed memory of the worker nodes. All communication and data transfers are handled via a Publish/Subscribe messaging infrastructure. Twister comprises three main entities: (i) Twister Driver or Client that drives the entire MapReduce computation, (ii) Twister Daemon running on every worker node, and (iii) the broker network. We present two representative results of our initial analysis of Twister [<xref ref-type="bibr" rid="B25">25</xref>
,<xref ref-type="bibr" rid="B29">29</xref>
] in Figure <xref ref-type="fig" rid="F5">5</xref>
 and <xref ref-type="fig" rid="F6">6</xref>
.</p>
<fig id="F4" position="float"><label>Figure 4</label>
<caption><p>Current Twister Prototype</p>
</caption>
<graphic xlink:href="1471-2105-11-S12-S3-4"></graphic>
</fig>
<fig id="F5" position="float"><label>Figure 5</label>
<caption><p>Parallel Efficiency of the different parallel runtimes for the Smith Waterman Gotoh algorithm for distance computation</p>
</caption>
<graphic xlink:href="1471-2105-11-S12-S3-5"></graphic>
</fig>
<fig id="F6" position="float"><label>Figure 6</label>
<caption><p>Total running time for 20 iterations of PageRank algorithm on ClueWeb data with Twister and Hadoop on 256 cores</p>
</caption>
<graphic xlink:href="1471-2105-11-S12-S3-6"></graphic>
</fig>
<p>We showed “doubly data parallel” (all pairs) application like pairwise distance calculation using Smith Waterman Gotoh algorithm can be implemented with Hadoop, Dyrad, and MPI [<xref ref-type="bibr" rid="B30">30</xref>
]. Further, Figure <xref ref-type="fig" rid="F5">5</xref>
 shows a classic MapReduce application already studied in Figure <xref ref-type="fig" rid="F2">2</xref>
 and demonstrates that Twister will perform well in this limit, although its iterative extensions are not needed. We use the conventional efficiency defined as T(1)/(pT(p)), where T(p) is runtime on p cores. The results shown in Figure <xref ref-type="fig" rid="F5">5</xref>
 were obtained using 744 cores (31 24-core nodes). Twister outperforms Hadoop because of its faster data communication mechanism and the lower overhead in the static task scheduling. Moreover, in Hadoop each map/reduce task is executed as a separate process, whereas Twister uses a hybrid approach in which the map/reduce tasks assigned to a given daemon are executed within one Java Virtual Machine (JVM). The lower efficiency in DryadLINQ shown in Figure <xref ref-type="fig" rid="F5">5</xref>
 was mainly due to an inefficient task scheduling mechanism used in the initial academic release [<xref ref-type="bibr" rid="B21">21</xref>
]. We also investigated Twister PageRank performance using a ClueWeb data set [<xref ref-type="bibr" rid="B31">31</xref>
] collected in January 2009. We built the adjacency matrix using this data set and tested the page rank application using 32 8-core nodes. Figure <xref ref-type="fig" rid="F6">6</xref>
 shows that Twister performs much better than Hadoop on this algorithm [<xref ref-type="bibr" rid="B32">32</xref>
], which has the iterative structure, for which Twister was designed.</p>
</sec>
</sec>
</sec>
<sec><title>Conclusions</title>
<p>We have shown that MapReduce gives good performance for several applications and is comparable in performance to but easier to use [<xref ref-type="bibr" rid="B33">33</xref>
] (from its high level support of parallelism) than conventional master-worker approaches, which are automated in Azure with its concept of roles. However many data mining steps cannot efficiently use MapReduce and we propose a hybrid cloud-cluster architecture to link MPI and MapReduce components. We introduced the MapReduce extension Twister [<xref ref-type="bibr" rid="B25">25</xref>
,<xref ref-type="bibr" rid="B29">29</xref>
] to allow a uniform programming paradigm across all processing steps in a pipeline typified by Figure <xref ref-type="fig" rid="F1">1</xref>
.</p>
</sec>
<sec sec-type="methods"><title>Methods</title>
<p>We used three major computational infrastructures: Azure, Amazon and FutureGrid. FutureGrid offers a flexible environment for our rigorous benchmarking of virtual machine and "bare-metal" (non-VM) based approaches, and an early prototype of FutureGrid software was used in our initial work. We used four distinct parallel computing paradigms: the master-worker model, MPI, MapReduce and Twister.</p>
</sec>
<sec><title>List of abbreviations</title>
<p>MPI: Message Passing Interface; NSF: National Science Fundation; UC Santa Barbara HPC Research: University of California Santa Barbara High Performance Computing Research; OCI: Office of Cyberinfrastructure; DOE: Department of Energy; EU: European Union; VM: Virtual Machine; HPC: High Performance Computing; DNA: Deoxyribonucleic Acid; BLAST: Basic Local Alignment Search Tool; MDS: Multidimensional Scaling; JVM: Java Virtual Machine</p>
</sec>
<sec><title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec><title>Authors' contributions</title>
<p>JQ participated in study of the hybrid Cloud and clustering computing pipeline model. JE, TG, JQ, HL, BZ, TLW carried out Smith Waterman Gotoh sequence alignment using Twister, DryadLINQ, Hadoop, and MPI. JYC, SHB and JE, contributed on parallel MDS and GTM using MPI and Twister. YR, ES and AH studied workflow and job scheduling on clusters. GF participated in study of Cloud and parallel computing research issues.</p>
</sec>
</body>
<back><sec><title>Acknowledgements</title>
<p>We appreciate Microsoft for their technical support. This work was made possible using the computing use grant provided by Amazon Web Services which is titled "Proof of concepts linking FutureGrid users to AWS". This work is partially funded by Microsoft "CRMC" grant and NIH Grant Number RC2HG005806-02. This document was developed with support from the National Science Foundation (NSF) under Grant No. 0910812 to Indiana University for "FutureGrid: An Experimental, High-Performance Grid Test-bed." Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessary</p>
<p>This article has been published as part of <italic>BMC Bioinformatics</italic>
 Volume 11 Supplement 12, 2010: Proceedings of the 11th Annual Bioinformatics Open Source Conference (BOSC) 2010. The full contents of the supplement are available online at <ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2105/11?issue=S12">http://www.biomedcentral.com/1471-2105/11?issue=S12</ext-link>
.</p>
</sec>
<ref-list><ref id="B1"><mixed-citation publication-type="other"><name><surname>Armbrust</surname>
<given-names>M</given-names>
</name>
<name><surname>Fox</surname>
</name>
<name><surname>Griffith</surname>
<given-names>R</given-names>
</name>
<name><surname>Joseph</surname>
<given-names>AD</given-names>
</name>
<name><surname>Katz</surname>
<given-names>R</given-names>
</name>
<name><surname>Konwinski</surname>
<given-names>A</given-names>
</name>
<name><surname>Lee</surname>
<given-names>G</given-names>
</name>
<name><surname>Patterson</surname>
<given-names>D</given-names>
</name>
<name><surname>Rabkin</surname>
<given-names>A</given-names>
</name>
<name><surname>Stoica</surname>
<given-names>I</given-names>
</name>
<name><surname>Zaharia</surname>
<given-names>M</given-names>
</name>
<article-title>Above the Clouds: A Berkeley View of Cloud Computing</article-title>
<source>Technical report</source>
<ext-link ext-link-type="uri" xlink:href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf">http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf</ext-link>
</mixed-citation>
</ref>
<ref id="B2"><mixed-citation publication-type="other"><collab>Press Release</collab>
<article-title>Gartner's 2009 Hype Cycle Special Report Evaluates Maturity of 1,650 Technologies</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.gartner.com/it/page.jsp?id=1124212">http://www.gartner.com/it/page.jsp?id=1124212</ext-link>
</mixed-citation>
</ref>
<ref id="B3"><mixed-citation publication-type="book"><source>Cloud Computing Forum & Workshop</source>
<year>2010</year>
<publisher-name>NIST Information Technology Laboratory, Washington DC</publisher-name>
<ext-link ext-link-type="uri" xlink:href="http://www.nist.gov/itl/cloud.cfm">http://www.nist.gov/itl/cloud.cfm</ext-link>
</mixed-citation>
</ref>
<ref id="B4"><mixed-citation publication-type="other"><article-title>Nimbus Cloud Computing for Science</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.nimbusproject.org/">http://www.nimbusproject.org/</ext-link>
</mixed-citation>
</ref>
<ref id="B5"><mixed-citation publication-type="other"><article-title>OpenNebula Open Source Toolkit for Cloud Computing</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.opennebula.org/">http://www.opennebula.org/</ext-link>
</mixed-citation>
</ref>
<ref id="B6"><mixed-citation publication-type="other"><article-title>Sector and Sphere Data Intensive Cloud Computing Platform</article-title>
<ext-link ext-link-type="uri" xlink:href="http://sector.sourceforge.net/doc.html">http://sector.sourceforge.net/doc.html</ext-link>
</mixed-citation>
</ref>
<ref id="B7"><mixed-citation publication-type="other"><article-title>Eucalyptus Open Source Cloud Software</article-title>
<ext-link ext-link-type="uri" xlink:href="http://open.eucalyptus.com/">http://open.eucalyptus.com/</ext-link>
</mixed-citation>
</ref>
<ref id="B8"><mixed-citation publication-type="other"><article-title>FutureGrid Grid Testbed</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.futuregrid.org">http://www.futuregrid.org</ext-link>
</mixed-citation>
</ref>
<ref id="B9"><mixed-citation publication-type="other"><article-title>Magellan Cloud for Science</article-title>
<ext-link ext-link-type="uri" xlink:href="http://magellan.alcf.anl.gov/,http://www.nersc.gov/nusers/systems/magellan/">http://magellan.alcf.anl.gov/, http://www.nersc.gov/nusers/systems/magellan/</ext-link>
</mixed-citation>
</ref>
<ref id="B10"><mixed-citation publication-type="other"><article-title>European Framework 7 project starting June 1 2010 VENUS-C Virtual multidisciplinary EnviroNments USing Cloud infrastructure</article-title>
</mixed-citation>
</ref>
<ref id="B11"><mixed-citation publication-type="book"><source>Recordings of Presentations Cloud Futures 2010</source>
<year>2010</year>
<publisher-name>Redmond WA</publisher-name>
<ext-link ext-link-type="uri" xlink:href="http://research.microsoft.com/en-us/events/cloudfutures2010/videos.aspx">http://research.microsoft.com/en-us/events/cloudfutures2010/videos.aspx</ext-link>
</mixed-citation>
</ref>
<ref id="B12"><mixed-citation publication-type="other"><article-title>Lockheed Martin Cyber Security Alliance: Cloud Computing Whitepaper</article-title>
<year>2010</year>
<ext-link ext-link-type="uri" xlink:href="http://www.lockheedmartin.com/data/assets/isgs/documents/CloudComputingWhitePaper.pdf">http://www.lockheedmartin.com/data/assets/isgs/documents/CloudComputingWhitePaper.pdf</ext-link>
</mixed-citation>
</ref>
<ref id="B13"><mixed-citation publication-type="journal"><name><surname>Walker</surname>
<given-names>E</given-names>
</name>
<article-title>Benchmarking Amazon EC2 for High Performance Scientific Computing</article-title>
<source>USENIX</source>
<year>2008</year>
<volume>33</volume>
<issue>5</issue>
<ext-link ext-link-type="uri" xlink:href="http://www.usenix.org/publications/login/2008-10/openpdfs/walker.pdf">http://www.usenix.org/publications/login/2008-10/openpdfs/walker.pdf</ext-link>
</mixed-citation>
</ref>
<ref id="B14"><mixed-citation publication-type="book"><name><surname>Ekanayake</surname>
<given-names>J</given-names>
</name>
<name><surname>Qiu</surname>
<given-names>XH</given-names>
</name>
<name><surname>Gunarathne</surname>
<given-names>T</given-names>
</name>
<name><surname>Beason</surname>
<given-names>S</given-names>
</name>
<name><surname>Fox</surname>
<given-names>G</given-names>
</name>
<article-title>High Performance Parallel Computing with Clouds and Cloud Technologies</article-title>
<source>Book chapter to Cloud Computing and Software Services: Theory and Techniques</source>
<year>2010</year>
<publisher-name>CRC Press (Taylor and Francis)</publisher-name>
<ext-link ext-link-type="uri" xlink:href="http://grids.ucs.indiana.edu/ptliupages/publications/cloud_handbook_final-with-diagrams.pdf">http://grids.ucs.indiana.edu/ptliupages/publications/cloud_handbook_final-with-diagrams.pdf</ext-link>
</mixed-citation>
</ref>
<ref id="B15"><mixed-citation publication-type="book"><name><surname>Evangelinos</surname>
<given-names>C</given-names>
</name>
<name><surname>Hill</surname>
<given-names>CN</given-names>
</name>
<article-title>Cloud Computing for parallel Scientific HPC Applications: Feasibility of running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2</article-title>
<source>CCAO8: Cloud Computing and its Applications</source>
<year>2008</year>
<publisher-name>Chicago ILL USA</publisher-name>
</mixed-citation>
</ref>
<ref id="B16"><mixed-citation publication-type="book"><name><surname>Lange</surname>
<given-names>J</given-names>
</name>
<name><surname>Pedretti</surname>
<given-names>K</given-names>
</name>
<name><surname>Hudson</surname>
<given-names>T</given-names>
</name>
<name><surname>Dinda</surname>
<given-names>P</given-names>
</name>
<name><surname>Cui</surname>
<given-names>Z</given-names>
</name>
<name><surname>Xia</surname>
<given-names>L</given-names>
</name>
<name><surname>Bridges</surname>
<given-names>P</given-names>
</name>
<name><surname>Gocke</surname>
<given-names>A</given-names>
</name>
<name><surname>laconette</surname>
<given-names>S</given-names>
</name>
<name><surname>Levenhagen</surname>
<given-names>M</given-names>
</name>
<name><surname>Brightwell</surname>
<given-names>R</given-names>
</name>
<article-title>Palacios and Kitten: New High Performance Operating Systems For Scalable Virtualized and Native Supercomputing</article-title>
<source>24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010)</source>
<year>2010</year>
<publisher-name>Atlanta, GA, USA</publisher-name>
</mixed-citation>
</ref>
<ref id="B17"><mixed-citation publication-type="other"><name><surname>Fox</surname>
<given-names>G</given-names>
</name>
<article-title>White Paper: FutureGrid Platform FGPlatform: Rationale and Possible Directions)</article-title>
<year>2010</year>
<ext-link ext-link-type="uri" xlink:href="http://grids.ucs.indiana.edu/ptliupages/publications/FGPlatform.docx">http://grids.ucs.indiana.edu/ptliupages/publications/FGPlatform.docx</ext-link>
</mixed-citation>
</ref>
<ref id="B18"><mixed-citation publication-type="book"><name><surname>Dean</surname>
<given-names>J</given-names>
</name>
<name><surname>Ghemawat</surname>
<given-names>S</given-names>
</name>
<article-title>MapReduce: simplified data processing on large clusters</article-title>
<source>Commun</source>
<year>2008</year>
<volume>51</volume>
<issue>1</issue>
<publisher-name>ACM</publisher-name>
<fpage>107</fpage>
<lpage>113</lpage>
</mixed-citation>
</ref>
<ref id="B19"><mixed-citation publication-type="other"><article-title>Open source MapReduce Apache Hadoop</article-title>
<ext-link ext-link-type="uri" xlink:href="http://hadoop.apache.org/core/">http://hadoop.apache.org/core/</ext-link>
</mixed-citation>
</ref>
<ref id="B20"><mixed-citation publication-type="other"><name><surname>Ekanayake</surname>
<given-names>J</given-names>
</name>
<name><surname>Gunarathne</surname>
<given-names>T</given-names>
</name>
<name><surname>Qiu</surname>
<given-names>J</given-names>
</name>
<name><surname>Fox</surname>
<given-names>G</given-names>
</name>
<name><surname>Beason</surname>
<given-names>S</given-names>
</name>
<name><surname>Choi</surname>
<given-names>JY</given-names>
</name>
<name><surname>Ruan</surname>
<given-names>Y</given-names>
</name>
<name><surname>Bae</surname>
<given-names>SH</given-names>
</name>
<name><surname>Li</surname>
<given-names>H</given-names>
</name>
<article-title>Technical Report: Applicability of DryadLINQ to Scientific Applications</article-title>
<year>2010</year>
<ext-link ext-link-type="uri" xlink:href="http://grids.ucs.indiana.edu/ptliupages/publications/DryadReport.pdf">http://grids.ucs.indiana.edu/ptliupages/publications/DryadReport.pdf</ext-link>
</mixed-citation>
</ref>
<ref id="B21"><mixed-citation publication-type="book"><name><surname>Ekanayake</surname>
<given-names>J</given-names>
</name>
<name><surname>Balkir</surname>
<given-names>A</given-names>
</name>
<name><surname>Gunarathne</surname>
<given-names>T</given-names>
</name>
<name><surname>Fox</surname>
<given-names>G</given-names>
</name>
<name><surname>Poulain</surname>
<given-names>C</given-names>
</name>
<name><surname>Araujo</surname>
<given-names>N</given-names>
</name>
<name><surname>Barga</surname>
<given-names>R</given-names>
</name>
<article-title>DryadLINQ for Scientific Analyses</article-title>
<source>5th IEEE International Conference on e-Science</source>
<year>2009</year>
<publisher-name>Oxford UK</publisher-name>
</mixed-citation>
</ref>
<ref id="B22"><mixed-citation publication-type="book"><name><surname>Fox</surname>
<given-names>GC</given-names>
</name>
<name><surname>Qiu</surname>
<given-names>XH</given-names>
</name>
<name><surname>Beason</surname>
<given-names>S</given-names>
</name>
<name><surname>Choi</surname>
<given-names>JY</given-names>
</name>
<name><surname>Rho</surname>
<given-names>M</given-names>
</name>
<name><surname>Tang</surname>
<given-names>H</given-names>
</name>
<name><surname>Devadasan</surname>
<given-names>N</given-names>
</name>
<name><surname>Liu</surname>
<given-names>G</given-names>
</name>
<person-group person-group-type="editor">Jaatun M., Zhao, G., Rong, C</person-group>
<article-title>Biomedical Case Studies in Data Intensive Computing</article-title>
<source>Keynote talk at The 1st International Conference on Cloud Computing (CloudCom 2009) at Beijing Jiaotong University, China December 1-4, 2009</source>
<year>2009</year>
<publisher-name>Springer Verlag LNC 5931 "Cloud Computing"</publisher-name>
<fpage>2</fpage>
<lpage>18</lpage>
</mixed-citation>
</ref>
<ref id="B23"><mixed-citation publication-type="book"><name><surname>Bae</surname>
<given-names>SH</given-names>
</name>
<name><surname>Choi</surname>
<given-names>JH</given-names>
</name>
<name><surname>Qiu</surname>
<given-names>J</given-names>
</name>
<name><surname>Fox</surname>
<given-names>J</given-names>
</name>
<article-title>Dimension Reduction and Visualization of Large High-dimensional Data via Interpolation</article-title>
<source>Proceedings of ACM HPDC 2010 conference</source>
<year>2010</year>
<publisher-name>Chicago, Illinois</publisher-name>
</mixed-citation>
</ref>
<ref id="B24"><mixed-citation publication-type="journal"><name><surname>Sammon</surname>
<given-names>JW</given-names>
</name>
<article-title>A nonlinear mapping for data structure analysis</article-title>
<source>IEEE Trans. Computers</source>
<year>1969</year>
<volume>C-18</volume>
<fpage>401</fpage>
<lpage>409</lpage>
<pub-id pub-id-type="doi">10.1109/T-C.1969.222678</pub-id>
</mixed-citation>
</ref>
<ref id="B25"><mixed-citation publication-type="book"><name><surname>Ekanayake</surname>
<given-names>J</given-names>
</name>
<collab>Computer Science PhD</collab>
<source>ARCHITECTURE AND PERFORMANCE OF RUNTIME ENVIRONMENTS FOR DATA INTENSIVE SCALABLE COMPUTING</source>
<year>2010</year>
<publisher-name>Bloomington: Indiana</publisher-name>
</mixed-citation>
</ref>
<ref id="B26"><mixed-citation publication-type="other"><name><surname>Fox</surname>
<given-names>GC</given-names>
</name>
<article-title>Algorithms and Application for Grids and Clouds</article-title>
<source>22nd ACM Symposium on Parallelism in Algorithms and Architectures</source>
<year>2010</year>
<ext-link ext-link-type="uri" xlink:href="http://grids.ucs.indiana.edu/ptliupages/presentations/SPAAJune14-10.pptx">http://grids.ucs.indiana.edu/ptliupages/presentations/SPAAJune14-10.pptx</ext-link>
</mixed-citation>
</ref>
<ref id="B27"><mixed-citation publication-type="book"><name><surname>Fox</surname>
<given-names>GC</given-names>
</name>
<name><surname>Williams</surname>
<given-names>RD</given-names>
</name>
<name><surname>Messina</surname>
<given-names>PC</given-names>
</name>
<source>Parallel computing works!</source>
<year>1994</year>
<publisher-name>Morgan Kaufmann Publishers</publisher-name>
<ext-link ext-link-type="uri" xlink:href="http://www.old-npac.org/copywrite/pcw/node278.html#SECTION001440000000000000000">http://www.old-npac.org/copywrite/pcw/node278.html#SECTION001440000000000000000</ext-link>
</mixed-citation>
</ref>
<ref id="B28"><mixed-citation publication-type="book"><name><surname>Chu</surname>
</name>
<name><surname>Cheng</surname>
<given-names>T</given-names>
</name>
<name><surname>Sang</surname>
<given-names>Kim K</given-names>
</name>
<name><surname>Yi</surname>
<given-names>Lin A</given-names>
</name>
<name><surname>Yu</surname>
<given-names>Y</given-names>
</name>
<name><surname>Bradski</surname>
<given-names>R</given-names>
</name>
<name><surname>Ng</surname>
<given-names>A</given-names>
</name>
<name><surname>Olukotun</surname>
<given-names>K</given-names>
</name>
<article-title>Map-Reduce for Machine Learning on Multicore</article-title>
<source>NIPS</source>
<year>2006</year>
<publisher-name>MIT Press</publisher-name>
</mixed-citation>
</ref>
<ref id="B29"><mixed-citation publication-type="other"><article-title>Twister Home page</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.iterativemapreduce.org/">http://www.iterativemapreduce.org/</ext-link>
</mixed-citation>
</ref>
<ref id="B30"><mixed-citation publication-type="book"><name><surname>Qiu</surname>
<given-names>X</given-names>
</name>
<name><surname>Ekanayake</surname>
<given-names>J</given-names>
</name>
<name><surname>Beason</surname>
<given-names>S</given-names>
</name>
<name><surname>Gunarathne</surname>
<given-names>T</given-names>
</name>
<name><surname>Fox</surname>
<given-names>G</given-names>
</name>
<name><surname>Barga</surname>
<given-names>R</given-names>
</name>
<name><surname>Gannon</surname>
<given-names>D</given-names>
</name>
<article-title>Cloud Technologies for Bioinformatics Applications</article-title>
<source>Proceedings of the 2nd ACM Workshop on Many-Task Computing on Grids and Supercomputers (SC09)</source>
<year>2009</year>
<publisher-name>Portland, Oregon</publisher-name>
<ext-link ext-link-type="uri" xlink:href="http://grids.ucs.indiana.edu/ptliupages/publications/MTAGSOct22-09A.pdf">http://grids.ucs.indiana.edu/ptliupages/publications/MTAGSOct22-09A.pdf</ext-link>
</mixed-citation>
</ref>
<ref id="B31"><mixed-citation publication-type="other"><article-title>The ClueWeb09 Dataset</article-title>
<ext-link ext-link-type="uri" xlink:href="http://boston.lti.cs.cmu.edu/Data/clueweb09/">http://boston.lti.cs.cmu.edu/Data/clueweb09/</ext-link>
</mixed-citation>
</ref>
<ref id="B32"><mixed-citation publication-type="book"><name><surname>Ekanayake</surname>
<given-names>J</given-names>
</name>
<name><surname>Li</surname>
<given-names>Hui</given-names>
</name>
<name><surname>Bingjing</surname>
<given-names>B Zhang</given-names>
</name>
<name><surname>Gunarathne</surname>
<given-names>T</given-names>
</name>
<name><surname>Bae</surname>
<given-names>SH</given-names>
</name>
<name><surname>Qiu</surname>
<given-names>J</given-names>
</name>
<name><surname>Fox</surname>
<given-names>G</given-names>
</name>
<article-title>Twister: A Runtime for Iterative MapReduce</article-title>
<source>Proceedings of the First International Workshop on MapReduce and its Applications of ACM HPDC 2010 conference</source>
<year>2010</year>
<publisher-name>Chicago, Illinois</publisher-name>
</mixed-citation>
</ref>
<ref id="B33"><mixed-citation publication-type="book"><name><surname>Gunarathne</surname>
<given-names>T</given-names>
</name>
<name><surname>Wu</surname>
<given-names>TL</given-names>
</name>
<name><surname>Qiu</surname>
<given-names>J</given-names>
</name>
<name><surname>Fox</surname>
<given-names>G</given-names>
</name>
<article-title>Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications</article-title>
<source>Proceedings of Emerging Computational Methods for the Life Sciences Workshop of ACM HPDC 2010 conference</source>
<year>2010</year>
<publisher-name>Chicago, Illinois</publisher-name>
<fpage>20</fpage>
<lpage>25</lpage>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000484 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000484 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3040529
   |texte=   Hybrid cloud and cluster computing paradigms for life science applications
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:21210982" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024

	Serveur d'exploration Cyberinfrastructure
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration Cyberinfrastructure

Hybrid cloud and cluster computing paradigms for life science applications

Hybrid cloud and cluster computing paradigms for life science applications

Source :

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki