Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

Identifieur interne : 000505 ( Pmc/Curation ); précédent : 000504; suivant : 000506

Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

Auteurs : Anjani Ragothaman [États-Unis] ; Sairam Chowdary Boddu [États-Unis] ; Nayong Kim [États-Unis] ; Wei Feinstein [États-Unis] ; Michal Brylinski [États-Unis] ; Shantenu Jha [États-Unis] ; Joohyun Kim [États-Unis]

Source :

RBID : PMC:4066679

Abstract

While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.


Url:
DOI: 10.1155/2014/348725
PubMed: 24995285
PubMed Central: 4066679

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4066679

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics</title>
<author>
<name sortKey="Ragothaman, Anjani" sort="Ragothaman, Anjani" uniqKey="Ragothaman A" first="Anjani" last="Ragothaman">Anjani Ragothaman</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Boddu, Sairam Chowdary" sort="Boddu, Sairam Chowdary" uniqKey="Boddu S" first="Sairam Chowdary" last="Boddu">Sairam Chowdary Boddu</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Kim, Nayong" sort="Kim, Nayong" uniqKey="Kim N" first="Nayong" last="Kim">Nayong Kim</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Feinstein, Wei" sort="Feinstein, Wei" uniqKey="Feinstein W" first="Wei" last="Feinstein">Wei Feinstein</name>
<affiliation wicri:level="1">
<nlm:aff id="I3">Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Brylinski, Michal" sort="Brylinski, Michal" uniqKey="Brylinski M" first="Michal" last="Brylinski">Michal Brylinski</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="I3">Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Jha, Shantenu" sort="Jha, Shantenu" uniqKey="Jha S" first="Shantenu" last="Jha">Shantenu Jha</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Kim, Joohyun" sort="Kim, Joohyun" uniqKey="Kim J" first="Joohyun" last="Kim">Joohyun Kim</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24995285</idno>
<idno type="pmc">4066679</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4066679</idno>
<idno type="RBID">PMC:4066679</idno>
<idno type="doi">10.1155/2014/348725</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000505</idno>
<idno type="wicri:Area/Pmc/Curation">000505</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics</title>
<author>
<name sortKey="Ragothaman, Anjani" sort="Ragothaman, Anjani" uniqKey="Ragothaman A" first="Anjani" last="Ragothaman">Anjani Ragothaman</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Boddu, Sairam Chowdary" sort="Boddu, Sairam Chowdary" uniqKey="Boddu S" first="Sairam Chowdary" last="Boddu">Sairam Chowdary Boddu</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Kim, Nayong" sort="Kim, Nayong" uniqKey="Kim N" first="Nayong" last="Kim">Nayong Kim</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Feinstein, Wei" sort="Feinstein, Wei" uniqKey="Feinstein W" first="Wei" last="Feinstein">Wei Feinstein</name>
<affiliation wicri:level="1">
<nlm:aff id="I3">Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Brylinski, Michal" sort="Brylinski, Michal" uniqKey="Brylinski M" first="Michal" last="Brylinski">Michal Brylinski</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="I3">Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Jha, Shantenu" sort="Jha, Shantenu" uniqKey="Jha S" first="Shantenu" last="Jha">Shantenu Jha</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Kim, Joohyun" sort="Kim, Joohyun" uniqKey="Kim J" first="Joohyun" last="Kim">Joohyun Kim</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BioMed Research International</title>
<idno type="ISSN">2314-6133</idno>
<idno type="eISSN">2314-6141</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, R" uniqKey="Chen R">R Chen</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Feng, X" uniqKey="Feng X">X Feng</name>
</author>
<author>
<name sortKey="Liu, X" uniqKey="Liu X">X Liu</name>
</author>
<author>
<name sortKey="Luo, Q" uniqKey="Luo Q">Q Luo</name>
</author>
<author>
<name sortKey="Liu, B F" uniqKey="Liu B">B-F Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schuster, Sc" uniqKey="Schuster S">SC Schuster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Furey, Ts" uniqKey="Furey T">TS Furey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, R" uniqKey="Chen R">R Chen</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mardis, Er" uniqKey="Mardis E">ER Mardis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Z" uniqKey="Wang Z">Z Wang</name>
</author>
<author>
<name sortKey="Gerstein, M" uniqKey="Gerstein M">M Gerstein</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shendure, J" uniqKey="Shendure J">J Shendure</name>
</author>
<author>
<name sortKey="Aiden, El" uniqKey="Aiden E">EL Aiden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcpherson, Jd" uniqKey="Mcpherson J">JD McPherson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Juncker, As" uniqKey="Juncker A">AS Juncker</name>
</author>
<author>
<name sortKey="Jensen, Lj" uniqKey="Jensen L">LJ Jensen</name>
</author>
<author>
<name sortKey="Pierleoni, A" uniqKey="Pierleoni A">A Pierleoni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Loewenstein, Y" uniqKey="Loewenstein Y">Y Loewenstein</name>
</author>
<author>
<name sortKey="Raimondo, D" uniqKey="Raimondo D">D Raimondo</name>
</author>
<author>
<name sortKey="Redfern, Oc" uniqKey="Redfern O">OC Redfern</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Skolnick, J" uniqKey="Skolnick J">J Skolnick</name>
</author>
<author>
<name sortKey="Fetrow, Js" uniqKey="Fetrow J">JS Fetrow</name>
</author>
<author>
<name sortKey="Kolinski, A" uniqKey="Kolinski A">A Kolinski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schnoes, Am" uniqKey="Schnoes A">AM Schnoes</name>
</author>
<author>
<name sortKey="Brown, Sd" uniqKey="Brown S">SD Brown</name>
</author>
<author>
<name sortKey="Dodevski, I" uniqKey="Dodevski I">I Dodevski</name>
</author>
<author>
<name sortKey="Babbitt, Pc" uniqKey="Babbitt P">PC Babbitt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Skolnick, J" uniqKey="Skolnick J">J Skolnick</name>
</author>
<author>
<name sortKey="Brylinski, M" uniqKey="Brylinski M">M Brylinski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Capra, Ja" uniqKey="Capra J">JA Capra</name>
</author>
<author>
<name sortKey="Laskowski, Ra" uniqKey="Laskowski R">RA Laskowski</name>
</author>
<author>
<name sortKey="Thornton, Jm" uniqKey="Thornton J">JM Thornton</name>
</author>
<author>
<name sortKey="Singh, M" uniqKey="Singh M">M Singh</name>
</author>
<author>
<name sortKey="Funkhouser, Ta" uniqKey="Funkhouser T">TA Funkhouser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Glaser, F" uniqKey="Glaser F">F Glaser</name>
</author>
<author>
<name sortKey="Rosenberg, Y" uniqKey="Rosenberg Y">Y Rosenberg</name>
</author>
<author>
<name sortKey="Kessel, A" uniqKey="Kessel A">A Kessel</name>
</author>
<author>
<name sortKey="Pupko, T" uniqKey="Pupko T">T Pupko</name>
</author>
<author>
<name sortKey="Ben Tal, N" uniqKey="Ben Tal N">N Ben-Tal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brylinski, M" uniqKey="Brylinski M">M Brylinski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gunarathne, T" uniqKey="Gunarathne T">T Gunarathne</name>
</author>
<author>
<name sortKey="Wu, T L" uniqKey="Wu T">T-L Wu</name>
</author>
<author>
<name sortKey="Choi, Jy" uniqKey="Choi J">JY Choi</name>
</author>
<author>
<name sortKey="Bae, S" uniqKey="Bae S">S Bae</name>
</author>
<author>
<name sortKey="Qiu, J" uniqKey="Qiu J">J Qiu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jha, S" uniqKey="Jha S">S Jha</name>
</author>
<author>
<name sortKey="Katz, Ds" uniqKey="Katz D">DS Katz</name>
</author>
<author>
<name sortKey="Luckow, A" uniqKey="Luckow A">A Luckow</name>
</author>
<author>
<name sortKey="Merzky, A" uniqKey="Merzky A">A Merzky</name>
</author>
<author>
<name sortKey="Stamou, K" uniqKey="Stamou K">K Stamou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Taylor, Rc" uniqKey="Taylor R">RC Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gurtowski, J" uniqKey="Gurtowski J">J Gurtowski</name>
</author>
<author>
<name sortKey="Schatz, Mc" uniqKey="Schatz M">MC Schatz</name>
</author>
<author>
<name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schatz, Mc" uniqKey="Schatz M">MC Schatz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baker, M" uniqKey="Baker M">M Baker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, J" uniqKey="Kim J">J Kim</name>
</author>
<author>
<name sortKey="Maddineni, S" uniqKey="Maddineni S">S Maddineni</name>
</author>
<author>
<name sortKey="Jha, S" uniqKey="Jha S">S Jha</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mantha, Pk" uniqKey="Mantha P">PK Mantha</name>
</author>
<author>
<name sortKey="Kim, N" uniqKey="Kim N">N Kim</name>
</author>
<author>
<name sortKey="Luckow, A" uniqKey="Luckow A">A Luckow</name>
</author>
<author>
<name sortKey="Kim, J" uniqKey="Kim J">J Kim</name>
</author>
<author>
<name sortKey="Jha, S" uniqKey="Jha S">S Jha</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brylinski, M" uniqKey="Brylinski M">M Brylinski</name>
</author>
<author>
<name sortKey="Lingam, D" uniqKey="Lingam D">D Lingam</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brylinski, M" uniqKey="Brylinski M">M Brylinski</name>
</author>
<author>
<name sortKey="Feinstein, Wp" uniqKey="Feinstein W">WP Feinstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luckow, A" uniqKey="Luckow A">A Luckow</name>
</author>
<author>
<name sortKey="Santcroos, M" uniqKey="Santcroos M">M Santcroos</name>
</author>
<author>
<name sortKey="Merzky, A" uniqKey="Merzky A">A Merzky</name>
</author>
<author>
<name sortKey="Weidner, O" uniqKey="Weidner O">O Weidner</name>
</author>
<author>
<name sortKey="Mantha, P" uniqKey="Mantha P">P Mantha</name>
</author>
<author>
<name sortKey="Jha, S" uniqKey="Jha S">S Jha</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luckow, A" uniqKey="Luckow A">A Luckow</name>
</author>
<author>
<name sortKey="Lacinski, L" uniqKey="Lacinski L">L Lacinski</name>
</author>
<author>
<name sortKey="Jha, S" uniqKey="Jha S">S Jha</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luckow, A" uniqKey="Luckow A">A Luckow</name>
</author>
<author>
<name sortKey="Santcroos, M" uniqKey="Santcroos M">M Santcroos</name>
</author>
<author>
<name sortKey="Weidner, O" uniqKey="Weidner O">O Weidner</name>
</author>
<author>
<name sortKey="Zebrowski, A" uniqKey="Zebrowski A">A Zebrowski</name>
</author>
<author>
<name sortKey="Jha, S" uniqKey="Jha S">S Jha</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maddineni, S" uniqKey="Maddineni S">S Maddineni</name>
</author>
<author>
<name sortKey="Kim, J" uniqKey="Kim J">J Kim</name>
</author>
<author>
<name sortKey="El Khamra, Y" uniqKey="El Khamra Y">Y El-Khamra</name>
</author>
<author>
<name sortKey="Jha, S" uniqKey="Jha S">S Jha</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Biomed Res Int</journal-id>
<journal-id journal-id-type="iso-abbrev">Biomed Res Int</journal-id>
<journal-id journal-id-type="publisher-id">BMRI</journal-id>
<journal-title-group>
<journal-title>BioMed Research International</journal-title>
</journal-title-group>
<issn pub-type="ppub">2314-6133</issn>
<issn pub-type="epub">2314-6141</issn>
<publisher>
<publisher-name>Hindawi Publishing Corporation</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24995285</article-id>
<article-id pub-id-type="pmc">4066679</article-id>
<article-id pub-id-type="doi">10.1155/2014/348725</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ragothaman</surname>
<given-names>Anjani</given-names>
</name>
<xref ref-type="aff" rid="I1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">http://orcid.org/0000-0003-0990-3321</contrib-id>
<name>
<surname>Boddu</surname>
<given-names>Sairam Chowdary</given-names>
</name>
<xref ref-type="aff" rid="I2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">http://orcid.org/0000-0003-3412-9956</contrib-id>
<name>
<surname>Kim</surname>
<given-names>Nayong</given-names>
</name>
<xref ref-type="aff" rid="I2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Feinstein</surname>
<given-names>Wei</given-names>
</name>
<xref ref-type="aff" rid="I3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">http://orcid.org/0000-0002-6204-2869</contrib-id>
<name>
<surname>Brylinski</surname>
<given-names>Michal</given-names>
</name>
<xref ref-type="aff" rid="I2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="I3">
<sup>3</sup>
</xref>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jha</surname>
<given-names>Shantenu</given-names>
</name>
<xref ref-type="aff" rid="I1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="cor2">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kim</surname>
<given-names>Joohyun</given-names>
</name>
<xref ref-type="aff" rid="I2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="cor3">*</xref>
</contrib>
</contrib-group>
<aff id="I1">
<sup>1</sup>
RADICAL, ECE, Rutgers University, New Brunswick, NJ 08901, USA</aff>
<aff id="I2">
<sup>2</sup>
Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA</aff>
<aff id="I3">
<sup>3</sup>
Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA</aff>
<author-notes>
<corresp id="cor1">*Michal Brylinski:
<email>mbrylinski@lsu.edu</email>
</corresp>
<corresp id="cor2">*Shantenu Jha:
<email>shantenu.jha@rutgers.edu</email>
and </corresp>
<corresp id="cor3">*Joohyun Kim:
<email>jhkim@cct.lsu.edu</email>
</corresp>
<fn fn-type="other">
<p>Academic Editor: Daniele D'Agostino</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>9</day>
<month>6</month>
<year>2014</year>
</pub-date>
<volume>2014</volume>
<elocation-id>348725</elocation-id>
<history>
<date date-type="received">
<day>6</day>
<month>3</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>8</day>
<month>5</month>
<year>2014</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2014 Anjani Ragothaman et al.</copyright-statement>
<copyright-year>2014</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.</p>
</abstract>
<funding-group>
<award-group>
<funding-source>http://dx.doi.org/10.13039/100000002 National Institutes of Health</funding-source>
<award-id>P20 GM103458-10</award-id>
</award-group>
<award-group>
<funding-source>Louisiana Board of Regents</funding-source>
<award-id>LEQSF (2012-15)-RD-A-05 to MB</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<floats-group>
<fig id="fig1" orientation="portrait" position="float">
<label>Figure 1</label>
<caption>
<p>Schematics of the pilot-based eThread pipeline on EC2. The eThread pipeline can accept a massive number of sequences, identified from genome-wide sequencing methods such as RNA-Seq, for example, as input, and carry out metathreading-based structural bioinformatics analysis including structure modeling. SAGA-Pilot enables its execution on Amazon EC2 cloud environment to be efficient by facilitating data and task-level parallelization.</p>
</caption>
<graphic xlink:href="BMRI2014-348725.001"></graphic>
</fig>
<fig id="fig2" orientation="portrait" position="float">
<label>Figure 2</label>
<caption>
<p>Overall workflow of the pilot-based eThread pipeline on EC2.</p>
</caption>
<graphic xlink:href="BMRI2014-348725.002"></graphic>
</fig>
<fig id="fig3" orientation="portrait" position="float">
<label>Figure 3</label>
<caption>
<p>Required total execution times for 110 sequences shown in (a) and corresponding cost shown in (b) for pfTools across different types of EC2 instances. CPU utilization is also shown in (c).</p>
</caption>
<graphic xlink:href="BMRI2014-348725.003"></graphic>
</fig>
<fig id="fig4" orientation="portrait" position="float">
<label>Figure 4</label>
<caption>
<p>Pilot-based profiling of tools using different EC2 instances. Comparison of time-to-solution for the 10 threading tools, two standalone tools, BLAST and PSIPRED, and meta-analysis step is presented. Cases with m1.large (red) and hi1.4xlarge (blue) are shown and 20 sequences are used. Note that THREADER with m1.large takes 2897 mins which is not fully shown.</p>
</caption>
<graphic xlink:href="BMRI2014-348725.004"></graphic>
</fig>
<fig id="fig5" orientation="portrait" position="float">
<label>Figure 5</label>
<caption>
<p>Time-to-solution of each elementary step in the pipeline using 2 VMs. Results are obtained with 20 sequences and pfTools are used. The times for VM launch (black), threading against chain library (red), threading against domain library (blue), postprocessing for chain (green), and postprocessing for domain (yellow) are shown together.</p>
</caption>
<graphic xlink:href="BMRI2014-348725.005"></graphic>
</fig>
<fig id="fig6" orientation="portrait" position="float">
<label>Figure 6</label>
<caption>
<p>Time-to-solution of each elementary step in the pipeline using 2 heterogeneous VMs (a). Single VM results are presented for comparison (b). Results are obtained with 20 sequences and pfTools are used.</p>
</caption>
<graphic xlink:href="BMRI2014-348725.006"></graphic>
</fig>
<fig id="alg1" orientation="portrait" position="float">
<label>Algorithm 1</label>
<caption>
<p>Serial algorithm for eThread.</p>
</caption>
<graphic xlink:href="BMRI2014-348725.alg.001"></graphic>
</fig>
<fig id="alg2" orientation="portrait" position="float">
<label>Algorithm 2</label>
<caption>
<p>Task-level parallel algorithm for eThread.</p>
</caption>
<graphic xlink:href="BMRI2014-348725.alg.002"></graphic>
</fig>
<fig id="alg3" orientation="portrait" position="float">
<label>Algorithm 3</label>
<caption>
<p>Proposed algorithm combining task-level parallelism and dynamic scheduling for eThread on EC2.</p>
</caption>
<graphic xlink:href="BMRI2014-348725.alg.003"></graphic>
</fig>
<fig id="alg4" orientation="portrait" position="float">
<label>Algorithm 4</label>
<caption>
<p>Simple dynamic scheduling implementation for eThread on EC2.</p>
</caption>
<graphic xlink:href="BMRI2014-348725.alg.004"></graphic>
</fig>
<table-wrap id="tab1" orientation="portrait" position="float">
<label>Table 1</label>
<caption>
<p>Threading tools incorporated in eThread and their workflow structures. For the categorization of computational loads and memory requirement, see the text.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Program name (version)</th>
<th align="center" rowspan="1" colspan="1">Number of subtasks</th>
<th align="center" rowspan="1" colspan="1">Prerequisite</th>
<th align="center" rowspan="1" colspan="1">Computational load</th>
<th align="center" rowspan="1" colspan="1">Memory requirement</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">THREADER (3.5)</td>
<td align="center" rowspan="1" colspan="1">4</td>
<td align="center" rowspan="1" colspan="1">PSIPRED (3.2.1), BLAST (2.2.5)</td>
<td align="center" rowspan="1" colspan="1">Highest</td>
<td align="center" rowspan="1" colspan="1">Low</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">SAM-T2K (3.5)</td>
<td align="center" rowspan="1" colspan="1">9</td>
<td align="center" rowspan="1" colspan="1">BLAST</td>
<td align="center" rowspan="1" colspan="1">High</td>
<td align="center" rowspan="1" colspan="1">High</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">HHpred (2.0)</td>
<td align="center" rowspan="1" colspan="1">7</td>
<td align="center" rowspan="1" colspan="1">BLAST</td>
<td align="center" rowspan="1" colspan="1">High</td>
<td align="center" rowspan="1" colspan="1">Medium</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">CS/CSI-BLAST (2.1.0)</td>
<td align="center" rowspan="1" colspan="1">4</td>
<td align="center" rowspan="1" colspan="1"></td>
<td align="center" rowspan="1" colspan="1">Low</td>
<td align="center" rowspan="1" colspan="1">Low</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">COMPASS (3.1)</td>
<td align="center" rowspan="1" colspan="1">7</td>
<td align="center" rowspan="1" colspan="1">BLAST</td>
<td align="center" rowspan="1" colspan="1">High</td>
<td align="center" rowspan="1" colspan="1">High</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">pfTools (2.3.4)</td>
<td align="center" rowspan="1" colspan="1">4</td>
<td align="center" rowspan="1" colspan="1"></td>
<td align="center" rowspan="1" colspan="1">Medium</td>
<td align="center" rowspan="1" colspan="1">Low</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">pGenTHREADER (8.9)</td>
<td align="center" rowspan="1" colspan="1">4</td>
<td align="center" rowspan="1" colspan="1">BLAST</td>
<td align="center" rowspan="1" colspan="1">High</td>
<td align="center" rowspan="1" colspan="1">Low</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">HMMER (3.1.b1)</td>
<td align="center" rowspan="1" colspan="1">4</td>
<td align="center" rowspan="1" colspan="1"></td>
<td align="center" rowspan="1" colspan="1">Low</td>
<td align="center" rowspan="1" colspan="1">Low</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">SPARKS (20050315)</td>
<td align="center" rowspan="1" colspan="1">4</td>
<td align="center" rowspan="1" colspan="1">BLAST</td>
<td align="center" rowspan="1" colspan="1">High</td>
<td align="center" rowspan="1" colspan="1">Medium</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">SP3 (20050315)</td>
<td align="center" rowspan="1" colspan="1">4</td>
<td align="center" rowspan="1" colspan="1">BLAST</td>
<td align="center" rowspan="1" colspan="1">High</td>
<td align="center" rowspan="1" colspan="1">Medium</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab2" orientation="portrait" position="float">
<label>Table 2</label>
<caption>
<p>The summary of EC2 instance types used for this study. For the instance type, E stands for economical, G for general purpose, M for memory-optimized, C for compute-optimized, and S for storage-optimized, following the description from Amazon. Nonsupporting threading tools are identified based on the profiling results of the previous work [
<xref rid="B28" ref-type="bibr">28</xref>
]. The cost information is obtained from the AWS site as of this writing and the unit is $0.02 which is the pricing for t1.micro.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Instance</th>
<th align="center" rowspan="1" colspan="1">Type</th>
<th align="center" rowspan="1" colspan="1">Number of cores</th>
<th align="center" rowspan="1" colspan="1">Memory (GB)</th>
<th align="center" rowspan="1" colspan="1">Nonsupport threading tools</th>
<th align="center" rowspan="1" colspan="1">Relative cost</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">t1.micro</td>
<td align="center" rowspan="1" colspan="1">E </td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">0.613</td>
<td align="center" rowspan="1" colspan="1">HHpred, COMPASS, SAM-T2K, pGenThreader, SPARKS, SP3</td>
<td align="center" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">m1.small</td>
<td align="center" rowspan="1" colspan="1">G</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">1.7</td>
<td align="center" rowspan="1" colspan="1">COMPASS, SAM-T2K, pGenThreader</td>
<td align="center" rowspan="1" colspan="1">3</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">m1.medium</td>
<td align="center" rowspan="1" colspan="1">G</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">3.7</td>
<td align="center" rowspan="1" colspan="1">SAM-T2K</td>
<td align="center" rowspan="1" colspan="1">6</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">m1.large</td>
<td align="center" rowspan="1" colspan="1">M</td>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">7.5</td>
<td align="center" rowspan="1" colspan="1">None</td>
<td align="center" rowspan="1" colspan="1">12</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">c1.medium</td>
<td align="center" rowspan="1" colspan="1">C</td>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">1.7</td>
<td align="center" rowspan="1" colspan="1">COMPASS, SAM-T2K, pGenThreader</td>
<td align="center" rowspan="1" colspan="1">7.25</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">c1.xlarge</td>
<td align="center" rowspan="1" colspan="1">C</td>
<td align="center" rowspan="1" colspan="1">8</td>
<td align="center" rowspan="1" colspan="1">7 </td>
<td align="center" rowspan="1" colspan="1">None</td>
<td align="center" rowspan="1" colspan="1">29</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">hi1.4xlarge</td>
<td align="center" rowspan="1" colspan="1">S</td>
<td align="center" rowspan="1" colspan="1">16</td>
<td align="center" rowspan="1" colspan="1">60.5</td>
<td align="center" rowspan="1" colspan="1">None</td>
<td align="center" rowspan="1" colspan="1">155</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab3" orientation="portrait" position="float">
<label>Table 3</label>
<caption>
<p>Benchmark data sets.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Length range (aa)</th>
<th align="center" rowspan="1" colspan="1">110 sequences</th>
<th align="center" rowspan="1" colspan="1">20 sequences</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">51–100</td>
<td align="center" rowspan="1" colspan="1">10</td>
<td align="center" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">101–150</td>
<td align="center" rowspan="1" colspan="1">10</td>
<td align="center" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">151–200</td>
<td align="center" rowspan="1" colspan="1">10</td>
<td align="center" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">201–250</td>
<td align="center" rowspan="1" colspan="1">10</td>
<td align="center" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">251–300</td>
<td align="center" rowspan="1" colspan="1">10</td>
<td align="center" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">301–350</td>
<td align="center" rowspan="1" colspan="1">10</td>
<td align="center" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">351–400</td>
<td align="center" rowspan="1" colspan="1">10</td>
<td align="center" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">401–450</td>
<td align="center" rowspan="1" colspan="1">10</td>
<td align="center" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">451–500</td>
<td align="center" rowspan="1" colspan="1">10</td>
<td align="center" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">501–550</td>
<td align="center" rowspan="1" colspan="1">10</td>
<td align="center" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">551–600</td>
<td align="center" rowspan="1" colspan="1">10</td>
<td align="center" rowspan="1" colspan="1">2</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab4" orientation="portrait" position="float">
<label>Table 4</label>
<caption>
<p>Breaking the time-to-solutions of the main processing step into subtasks. Four subtasks corresponding chain and domain libraries and their postprocessing are measured along with VM launch times. Results are with pfTools. Units are in minutes.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">VM launch</th>
<th align="center" rowspan="1" colspan="1">Chain</th>
<th align="center" rowspan="1" colspan="1">Domain</th>
<th align="center" rowspan="1" colspan="1">Chain postprocessing</th>
<th align="center" rowspan="1" colspan="1">Domain postprocessing</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" colspan="5" rowspan="1">t1.micro</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1.9</td>
<td align="center" rowspan="1" colspan="1">316.5</td>
<td align="center" rowspan="1" colspan="1">331.6</td>
<td align="center" rowspan="1" colspan="1">33.4</td>
<td align="center" rowspan="1" colspan="1">21.5</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">
<hr></hr>
</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">m1.small</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1.3</td>
<td align="center" rowspan="1" colspan="1">137.1</td>
<td align="center" rowspan="1" colspan="1">90.1</td>
<td align="center" rowspan="1" colspan="1">9.7</td>
<td align="center" rowspan="1" colspan="1">7.9</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">
<hr></hr>
</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">m1.medium</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1.3</td>
<td align="center" rowspan="1" colspan="1">62.2</td>
<td align="center" rowspan="1" colspan="1">42.9</td>
<td align="center" rowspan="1" colspan="1">6.1</td>
<td align="center" rowspan="1" colspan="1">4.4</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">
<hr></hr>
</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">m1.large</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1.2</td>
<td align="center" rowspan="1" colspan="1">31.1</td>
<td align="center" rowspan="1" colspan="1">21.5</td>
<td align="center" rowspan="1" colspan="1">3.4</td>
<td align="center" rowspan="1" colspan="1">2.7</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">
<hr></hr>
</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">c1.medium</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1.3</td>
<td align="center" rowspan="1" colspan="1">32.8</td>
<td align="center" rowspan="1" colspan="1">22.5</td>
<td align="center" rowspan="1" colspan="1">3.9</td>
<td align="center" rowspan="1" colspan="1">3.1</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">
<hr></hr>
</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">c1.xlarge</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1.3</td>
<td align="center" rowspan="1" colspan="1">9.5</td>
<td align="center" rowspan="1" colspan="1">6.5</td>
<td align="center" rowspan="1" colspan="1">1.1</td>
<td align="center" rowspan="1" colspan="1">1.2</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">
<hr></hr>
</td>
</tr>
<tr>
<td align="left" colspan="5" rowspan="1">hi1.4xlarge</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1.5</td>
<td align="center" rowspan="1" colspan="1">7.7</td>
<td align="center" rowspan="1" colspan="1">5.3</td>
<td align="center" rowspan="1" colspan="1">1.3</td>
<td align="center" rowspan="1" colspan="1">1.2</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab5" orientation="portrait" position="float">
<label>Table 5</label>
<caption>
<p>Time for launching an instance. Averaged values of 6 repeated experiments are shown with standard deviation.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Instance</th>
<th align="center" rowspan="1" colspan="1">Launching time (min)
<break></break>
(standard deviation)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">t1.micro</td>
<td align="center" rowspan="1" colspan="1">1.99 (0.2)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">m1.small</td>
<td align="center" rowspan="1" colspan="1">1.86 (0.08)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">m1.medium</td>
<td align="center" rowspan="1" colspan="1">1.80 (0.15)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">m1.large</td>
<td align="center" rowspan="1" colspan="1">1.70 (0.08)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">c1.medium</td>
<td align="center" rowspan="1" colspan="1">1.68 (0.17)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">c1.xlarge</td>
<td align="center" rowspan="1" colspan="1">1.69 (0.08)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">hi1.4xlarge</td>
<td align="center" rowspan="1" colspan="1">2.01 (0.16)</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab6" orientation="portrait" position="float">
<label>Table 6</label>
<caption>
<p>Summary of benchmark results for time-to-solution (TTS) and cost-to-solution (CTS). The 20-sequence data set is used. Among the complete benchmark experimental results obtained for all threading tools, we chose three threading tools here for the sake of space. TTS is in minutes and CTS is in US dollars based on the pricing as of this writing.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="2" colspan="1">VM type</th>
<th align="center" rowspan="1" colspan="1">TTS</th>
<th align="center" rowspan="1" colspan="1">CTS</th>
<th align="center" rowspan="1" colspan="1">TTS</th>
<th align="center" rowspan="1" colspan="1">CTS</th>
<th align="center" rowspan="1" colspan="1">TTS</th>
<th align="center" rowspan="1" colspan="1">CTS</th>
</tr>
<tr>
<th align="center" colspan="2" rowspan="1">HMMER</th>
<th align="center" colspan="2" rowspan="1">SP3</th>
<th align="center" colspan="2" rowspan="1">THREADER</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">t1.micro</td>
<td align="center" rowspan="1" colspan="1">33.1</td>
<td align="center" rowspan="1" colspan="1">0.01</td>
<td align="center" rowspan="1" colspan="1">N/A</td>
<td align="center" rowspan="1" colspan="1">N/A</td>
<td align="center" rowspan="1" colspan="1">96905.8</td>
<td align="center" rowspan="1" colspan="1">32.30</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">m1.small</td>
<td align="center" rowspan="1" colspan="1">29.0</td>
<td align="center" rowspan="1" colspan="1">0.03</td>
<td align="center" rowspan="1" colspan="1">1312.3</td>
<td align="center" rowspan="1" colspan="1">1.31</td>
<td align="center" rowspan="1" colspan="1">27842.2</td>
<td align="center" rowspan="1" colspan="1">27.84</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">m1.medium</td>
<td align="center" rowspan="1" colspan="1">19.6</td>
<td align="center" rowspan="1" colspan="1">0.04</td>
<td align="center" rowspan="1" colspan="1">670.7</td>
<td align="center" rowspan="1" colspan="1">1.34</td>
<td align="center" rowspan="1" colspan="1">11551.2</td>
<td align="center" rowspan="1" colspan="1">23.10</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">m1.large</td>
<td align="center" rowspan="1" colspan="1">9.8</td>
<td align="center" rowspan="1" colspan="1">0.04</td>
<td align="center" rowspan="1" colspan="1">458.0</td>
<td align="center" rowspan="1" colspan="1">1.83</td>
<td align="center" rowspan="1" colspan="1">2897.2</td>
<td align="center" rowspan="1" colspan="1">11.59</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">c1.medium</td>
<td align="center" rowspan="1" colspan="1">10.6</td>
<td align="center" rowspan="1" colspan="1">0.03</td>
<td align="center" rowspan="1" colspan="1">356.7</td>
<td align="center" rowspan="1" colspan="1">0.86</td>
<td align="center" rowspan="1" colspan="1">6833.8</td>
<td align="center" rowspan="1" colspan="1">16.52</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">c1.xlarge</td>
<td align="center" rowspan="1" colspan="1">6.1</td>
<td align="center" rowspan="1" colspan="1">0.06</td>
<td align="center" rowspan="1" colspan="1">118.6</td>
<td align="center" rowspan="1" colspan="1">1.15</td>
<td align="center" rowspan="1" colspan="1">2019.3</td>
<td align="center" rowspan="1" colspan="1">19.52</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">hi1.4xlarge</td>
<td align="center" rowspan="1" colspan="1">5.8</td>
<td align="center" rowspan="1" colspan="1">0.30</td>
<td align="center" rowspan="1" colspan="1">105.7</td>
<td align="center" rowspan="1" colspan="1">5.46</td>
<td align="center" rowspan="1" colspan="1">1552.2</td>
<td align="center" rowspan="1" colspan="1">80.20</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab7" orientation="portrait" position="float">
<label>Table 7</label>
<caption>
<p>Comparison of pipeline-based time-to-solutions with ideal limits. Ideal limits are obtained from the benchmark results of 20 sequences divided by the number of cores in an instance. Units are minutes.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="2" colspan="1">Tools</th>
<th align="center" rowspan="1" colspan="1">Pipeline</th>
<th align="center" rowspan="1" colspan="1">Ideal limit</th>
<th align="center" rowspan="1" colspan="1">Pipeline</th>
<th align="center" rowspan="1" colspan="1">Ideal limit</th>
<th align="center" rowspan="1" colspan="1">Pipeline</th>
<th align="center" rowspan="1" colspan="1">Ideal limit</th>
</tr>
<tr>
<th align="center" colspan="2" rowspan="1">m1.small</th>
<th align="center" colspan="2" rowspan="1">c1.xlarge</th>
<th align="center" colspan="2" rowspan="1">hi1.4xlarge</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">SAMT2K</td>
<td align="center" rowspan="1" colspan="1">1271.0</td>
<td align="center" rowspan="1" colspan="1">1055.7</td>
<td align="center" rowspan="1" colspan="1">224.5</td>
<td align="center" rowspan="1" colspan="1">65.6</td>
<td align="center" rowspan="1" colspan="1">168.3</td>
<td align="center" rowspan="1" colspan="1">35.7</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">SP3</td>
<td align="center" rowspan="1" colspan="1">1312.2</td>
<td align="center" rowspan="1" colspan="1">1124.4</td>
<td align="center" rowspan="1" colspan="1">118.6</td>
<td align="center" rowspan="1" colspan="1">68.1</td>
<td align="center" rowspan="1" colspan="1">105.7</td>
<td align="center" rowspan="1" colspan="1">33.0</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">CSBLAST</td>
<td align="center" rowspan="1" colspan="1">25.2</td>
<td align="center" rowspan="1" colspan="1">15.4</td>
<td align="center" rowspan="1" colspan="1">6.0</td>
<td align="center" rowspan="1" colspan="1">1.23</td>
<td align="center" rowspan="1" colspan="1">4.4</td>
<td align="center" rowspan="1" colspan="1">0.47</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">HMMER</td>
<td align="center" rowspan="1" colspan="1">29.0</td>
<td align="center" rowspan="1" colspan="1">16.0</td>
<td align="center" rowspan="1" colspan="1">6.1</td>
<td align="center" rowspan="1" colspan="1">1.0</td>
<td align="center" rowspan="1" colspan="1">5.8</td>
<td align="center" rowspan="1" colspan="1">0.6</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">pfTools</td>
<td align="center" rowspan="1" colspan="1">244.8</td>
<td align="center" rowspan="1" colspan="1">226.0</td>
<td align="center" rowspan="1" colspan="1">18.3</td>
<td align="center" rowspan="1" colspan="1">12.8</td>
<td align="center" rowspan="1" colspan="1">15.5</td>
<td align="center" rowspan="1" colspan="1">9.2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">THREADER</td>
<td align="center" rowspan="1" colspan="1">27842.2</td>
<td align="center" rowspan="1" colspan="1">23744.0</td>
<td align="center" rowspan="1" colspan="1">2019.3</td>
<td align="center" rowspan="1" colspan="1">1488.0</td>
<td align="center" rowspan="1" colspan="1">1552.2</td>
<td align="center" rowspan="1" colspan="1">1090.4</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">SPARKS</td>
<td align="center" rowspan="1" colspan="1">1021.8</td>
<td align="center" rowspan="1" colspan="1">1037.7</td>
<td align="center" rowspan="1" colspan="1">80.0</td>
<td align="center" rowspan="1" colspan="1">54.3</td>
<td align="center" rowspan="1" colspan="1">73.3</td>
<td align="center" rowspan="1" colspan="1">41.8</td>
</tr>
</tbody>
</table>
</table-wrap>
</floats-group>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000505 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000505 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:4066679
   |texte=   Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:24995285" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024