Serveur d'exploration H2N2

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Comparative phyloinformatics of virus genes at micro and macro levels in a distributed computing environment

Identifieur interne : 000110 ( Pmc/Corpus ); précédent : 000109; suivant : 000111

Comparative phyloinformatics of virus genes at micro and macro levels in a distributed computing environment

Auteurs : Dadabhai T. Singh ; Rahul Trehan ; Bertil Schmidt ; Timo Bretschneider

Source :

RBID : PMC:2259424

Abstract

Background

Preparedness for a possible global pandemic caused by viruses such as the highly pathogenic influenza A subtype H5N1 has become a global priority. In particular, it is critical to monitor the appearance of any new emerging subtypes. Comparative phyloinformatics can be used to monitor, analyze, and possibly predict the evolution of viruses. However, in order to utilize the full functionality of available analysis packages for large-scale phyloinformatics studies, a team of computer scientists, biostatisticians and virologists is needed – a requirement which cannot be fulfilled in many cases. Furthermore, the time complexities of many algorithms involved leads to prohibitive runtimes on sequential computer platforms. This has so far hindered the use of comparative phyloinformatics as a commonly applied tool in this area.

Results

In this paper the graphical-oriented workflow design system called Quascade and its efficient usage for comparative phyloinformatics are presented. In particular, we focus on how this task can be effectively performed in a distributed computing environment. As a proof of concept, the designed workflows are used for the phylogenetic analysis of neuraminidase of H5N1 isolates (micro level) and influenza viruses (macro level). The results of this paper are hence twofold. Firstly, this paper demonstrates the usefulness of a graphical user interface system to design and execute complex distributed workflows for large-scale phyloinformatics studies of virus genes. Secondly, the analysis of neuraminidase on different levels of complexity provides valuable insights of this virus's tendency for geographical based clustering in the phylogenetic tree and also shows the importance of glycan sites in its molecular evolution.

Conclusion

The current study demonstrates the efficiency and utility of workflow systems providing a biologist friendly approach to complex biological dataset analysis using high performance computing. In particular, the utility of the platform Quascade for deploying distributed and parallelized versions of a variety of computationally intensive phylogenetic algorithms has been shown. Secondly, the analysis of the utilized H5N1 neuraminidase datasets at macro and micro levels has clearly indicated a pattern of spatial clustering of the H5N1 viral isolates based on geographical distribution rather than temporal or host range based clustering.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-9-S1-S23) contains supplementary material, which is available to authorized users.


Url:
DOI: 10.1186/1471-2105-9-S1-S23
PubMed: 18315855
PubMed Central: 2259424

Links to Exploration step

PMC:2259424

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Comparative phyloinformatics of virus genes at micro and macro levels in a distributed computing environment</title>
<author>
<name sortKey="Singh, Dadabhai T" sort="Singh, Dadabhai T" uniqKey="Singh D" first="Dadabhai T" last="Singh">Dadabhai T. Singh</name>
<affiliation>
<nlm:aff id="Aff1">Genvea Biosciences, 53 Craig Road, #04-01, 089691 Singapore</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Trehan, Rahul" sort="Trehan, Rahul" uniqKey="Trehan R" first="Rahul" last="Trehan">Rahul Trehan</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.59025.3b</institution-id>
<institution-id institution-id-type="ISNI">0000000122240361</institution-id>
<institution>Nanyang Technological University,</institution>
</institution-wrap>
Nanyang Avenue N4-02a-32, 639798 Singapore</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schmidt, Bertil" sort="Schmidt, Bertil" uniqKey="Schmidt B" first="Bertil" last="Schmidt">Bertil Schmidt</name>
<affiliation>
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.1008.9</institution-id>
<institution-id institution-id-type="ISNI">000000012179088X</institution-id>
<institution>NICTA VRL,</institution>
<institution>University of Melbourne,</institution>
</institution-wrap>
Parkville, 3010 Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bretschneider, Timo" sort="Bretschneider, Timo" uniqKey="Bretschneider T" first="Timo" last="Bretschneider">Timo Bretschneider</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.59025.3b</institution-id>
<institution-id institution-id-type="ISNI">0000000122240361</institution-id>
<institution>Nanyang Technological University,</institution>
</institution-wrap>
Nanyang Avenue N4-02a-32, 639798 Singapore</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">18315855</idno>
<idno type="pmc">2259424</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2259424</idno>
<idno type="RBID">PMC:2259424</idno>
<idno type="doi">10.1186/1471-2105-9-S1-S23</idno>
<date when="2008">2008</date>
<idno type="wicri:Area/Pmc/Corpus">000110</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000110</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Comparative phyloinformatics of virus genes at micro and macro levels in a distributed computing environment</title>
<author>
<name sortKey="Singh, Dadabhai T" sort="Singh, Dadabhai T" uniqKey="Singh D" first="Dadabhai T" last="Singh">Dadabhai T. Singh</name>
<affiliation>
<nlm:aff id="Aff1">Genvea Biosciences, 53 Craig Road, #04-01, 089691 Singapore</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Trehan, Rahul" sort="Trehan, Rahul" uniqKey="Trehan R" first="Rahul" last="Trehan">Rahul Trehan</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.59025.3b</institution-id>
<institution-id institution-id-type="ISNI">0000000122240361</institution-id>
<institution>Nanyang Technological University,</institution>
</institution-wrap>
Nanyang Avenue N4-02a-32, 639798 Singapore</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schmidt, Bertil" sort="Schmidt, Bertil" uniqKey="Schmidt B" first="Bertil" last="Schmidt">Bertil Schmidt</name>
<affiliation>
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.1008.9</institution-id>
<institution-id institution-id-type="ISNI">000000012179088X</institution-id>
<institution>NICTA VRL,</institution>
<institution>University of Melbourne,</institution>
</institution-wrap>
Parkville, 3010 Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bretschneider, Timo" sort="Bretschneider, Timo" uniqKey="Bretschneider T" first="Timo" last="Bretschneider">Timo Bretschneider</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.59025.3b</institution-id>
<institution-id institution-id-type="ISNI">0000000122240361</institution-id>
<institution>Nanyang Technological University,</institution>
</institution-wrap>
Nanyang Avenue N4-02a-32, 639798 Singapore</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2008">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Preparedness for a possible global pandemic caused by viruses such as the highly pathogenic influenza A subtype H5N1 has become a global priority. In particular, it is critical to monitor the appearance of any new emerging subtypes. Comparative phyloinformatics can be used to monitor, analyze, and possibly predict the evolution of viruses. However, in order to utilize the full functionality of available analysis packages for large-scale phyloinformatics studies, a team of computer scientists, biostatisticians and virologists is needed – a requirement which cannot be fulfilled in many cases. Furthermore, the time complexities of many algorithms involved leads to prohibitive runtimes on sequential computer platforms. This has so far hindered the use of comparative phyloinformatics as a commonly applied tool in this area.</p>
</sec>
<sec>
<title>Results</title>
<p>In this paper the graphical-oriented workflow design system called
<italic>Quascade</italic>
and its efficient usage for comparative phyloinformatics are presented. In particular, we focus on how this task can be effectively performed in a distributed computing environment. As a proof of concept, the designed workflows are used for the phylogenetic analysis of neuraminidase of H5N1 isolates (micro level) and influenza viruses (macro level). The results of this paper are hence twofold. Firstly, this paper demonstrates the usefulness of a graphical user interface system to design and execute complex distributed workflows for large-scale phyloinformatics studies of virus genes. Secondly, the analysis of neuraminidase on different levels of complexity provides valuable insights of this virus's tendency for geographical based clustering in the phylogenetic tree and also shows the importance of glycan sites in its molecular evolution.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>The current study demonstrates the efficiency and utility of workflow systems providing a biologist friendly approach to complex biological dataset analysis using high performance computing. In particular, the utility of the platform Quascade for deploying distributed and parallelized versions of a variety of computationally intensive phylogenetic algorithms has been shown. Secondly, the analysis of the utilized H5N1 neuraminidase datasets at macro and micro levels has clearly indicated a pattern of spatial clustering of the H5N1 viral isolates based on geographical distribution rather than temporal or host range based clustering.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/1471-2105-9-S1-S23) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Zambon, Mc" uniqKey="Zambon M">MC Zambon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Subbarao, K" uniqKey="Subbarao K">K Subbarao</name>
</author>
<author>
<name sortKey="Shaw, Mw" uniqKey="Shaw M">MW Shaw</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hatta, M" uniqKey="Hatta M">M Hatta</name>
</author>
<author>
<name sortKey="Kawaoka, Y" uniqKey="Kawaoka Y">Y Kawaoka</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saitou, N" uniqKey="Saitou N">N Saitou</name>
</author>
<author>
<name sortKey="Nei, M" uniqKey="Nei M">M Nei</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holder, M" uniqKey="Holder M">M Holder</name>
</author>
<author>
<name sortKey="Lewis, Po" uniqKey="Lewis P">PO Lewis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ronquist, F" uniqKey="Ronquist F">F Ronquist</name>
</author>
<author>
<name sortKey="Huelsenbeck, Jp" uniqKey="Huelsenbeck J">JP Huelsenbeck</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Puthavathana, P" uniqKey="Puthavathana P">P Puthavathana</name>
</author>
<author>
<name sortKey="Auewarakul, P" uniqKey="Auewarakul P">P Auewarakul</name>
</author>
<author>
<name sortKey="Charoenying, Pc" uniqKey="Charoenying P">PC Charoenying</name>
</author>
<author>
<name sortKey="Sangsiriwut, K" uniqKey="Sangsiriwut K">K Sangsiriwut</name>
</author>
<author>
<name sortKey="Pooruk, P" uniqKey="Pooruk P">P Pooruk</name>
</author>
<author>
<name sortKey="Boonnak, K" uniqKey="Boonnak K">K Boonnak</name>
</author>
<author>
<name sortKey="Khanyok, R" uniqKey="Khanyok R">R Khanyok</name>
</author>
<author>
<name sortKey="Thawachsupa, P" uniqKey="Thawachsupa P">P Thawachsupa</name>
</author>
<author>
<name sortKey="Kijphati, R" uniqKey="Kijphati R">R Kijphati</name>
</author>
<author>
<name sortKey="Sawanpanyalert, P" uniqKey="Sawanpanyalert P">P Sawanpanyalert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tran, Th" uniqKey="Tran T">TH Tran</name>
</author>
<author>
<name sortKey="Nguyen, Tl" uniqKey="Nguyen T">TL Nguyen</name>
</author>
<author>
<name sortKey="Nguyen, Td" uniqKey="Nguyen T">TD Nguyen</name>
</author>
<author>
<name sortKey="Luong, Ts" uniqKey="Luong T">TS Luong</name>
</author>
<author>
<name sortKey="Pham, Pm" uniqKey="Pham P">PM Pham</name>
</author>
<author>
<name sortKey="Nguyen, Vc" uniqKey="Nguyen V">VC Nguyen</name>
</author>
<author>
<name sortKey="Pham, Ts" uniqKey="Pham T">TS Pham</name>
</author>
<author>
<name sortKey="Vo, Cd" uniqKey="Vo C">CD Vo</name>
</author>
<author>
<name sortKey="Le, Tqm" uniqKey="Le T">TQM Le</name>
</author>
<author>
<name sortKey="Ngo, Tt" uniqKey="Ngo T">TT Ngo</name>
</author>
<author>
<name sortKey="Dao, Bk" uniqKey="Dao B">BK Dao</name>
</author>
<author>
<name sortKey="Le, Pp" uniqKey="Le P">PP Le</name>
</author>
<author>
<name sortKey="Nguyen, Tt" uniqKey="Nguyen T">TT Nguyen</name>
</author>
<author>
<name sortKey="Hoang, Tl" uniqKey="Hoang T">TL Hoang</name>
</author>
<author>
<name sortKey="Cao, Vt" uniqKey="Cao V">VT Cao</name>
</author>
<author>
<name sortKey="Le, Tg" uniqKey="Le T">TG Le</name>
</author>
<author>
<name sortKey="Nguyen, Dt" uniqKey="Nguyen D">DT Nguyen</name>
</author>
<author>
<name sortKey="Le, Hn" uniqKey="Le H">HN Le</name>
</author>
<author>
<name sortKey="Nguyen, Tkt" uniqKey="Nguyen T">TKT Nguyen</name>
</author>
<author>
<name sortKey="Le, Hs" uniqKey="Le H">HS Le</name>
</author>
<author>
<name sortKey="Le, Vt" uniqKey="Le V">VT Le</name>
</author>
<author>
<name sortKey="Dolecek, C" uniqKey="Dolecek C">C Dolecek</name>
</author>
<author>
<name sortKey="Tran, Tt" uniqKey="Tran T">TT Tran</name>
</author>
<author>
<name sortKey="De Jong, M" uniqKey="De Jong M">M de Jong</name>
</author>
<author>
<name sortKey="Schultsz, C" uniqKey="Schultsz C">C Schultsz</name>
</author>
<author>
<name sortKey="Cheng, P" uniqKey="Cheng P">P Cheng</name>
</author>
<author>
<name sortKey="Lim, W" uniqKey="Lim W">W Lim</name>
</author>
<author>
<name sortKey="Horby, P" uniqKey="Horby P">P Horby</name>
</author>
<author>
<name sortKey="Farrar, J" uniqKey="Farrar J">J Farrar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thompson, Jd" uniqKey="Thompson J">JD Thompson</name>
</author>
<author>
<name sortKey="Higgins, Dg" uniqKey="Higgins D">DG Higgins</name>
</author>
<author>
<name sortKey="Gibson, Tj" uniqKey="Gibson T">TJ Gibson</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoon, S" uniqKey="Hoon S">S Hoon</name>
</author>
<author>
<name sortKey="Ratnapu, Kk" uniqKey="Ratnapu K">KK Ratnapu</name>
</author>
<author>
<name sortKey="Chia, J M" uniqKey="Chia J">J-M Chia</name>
</author>
<author>
<name sortKey="Kumarasamy, B" uniqKey="Kumarasamy B">B Kumarasamy</name>
</author>
<author>
<name sortKey="Xiao, J" uniqKey="Xiao J">J Xiao</name>
</author>
<author>
<name sortKey="Clamp, M" uniqKey="Clamp M">M Clamp</name>
</author>
<author>
<name sortKey="Stabenau, A" uniqKey="Stabenau A">A Stabenau</name>
</author>
<author>
<name sortKey="Potter, A" uniqKey="Potter A">A Potter</name>
</author>
<author>
<name sortKey="Clarke, L" uniqKey="Clarke L">L Clarke</name>
</author>
<author>
<name sortKey="Stupka, E" uniqKey="Stupka E">E Stupka</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leo, P" uniqKey="Leo P">P Leo</name>
</author>
<author>
<name sortKey="Marinelli, C" uniqKey="Marinelli C">C Marinelli</name>
</author>
<author>
<name sortKey="Pappada, G" uniqKey="Pappada G">G Pappadà</name>
</author>
<author>
<name sortKey="Scioscia, G" uniqKey="Scioscia G">G Scioscia</name>
</author>
<author>
<name sortKey="Zanchetta, L" uniqKey="Zanchetta L">L Zanchetta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oinn, T" uniqKey="Oinn T">T Oinn</name>
</author>
<author>
<name sortKey="Addis, M" uniqKey="Addis M">M Addis</name>
</author>
<author>
<name sortKey="Ferris, J" uniqKey="Ferris J">J Ferris</name>
</author>
<author>
<name sortKey="Marvin, D" uniqKey="Marvin D">D Marvin</name>
</author>
<author>
<name sortKey="Senger, M" uniqKey="Senger M">M Senger</name>
</author>
<author>
<name sortKey="Greenwood, M" uniqKey="Greenwood M">M Greenwood</name>
</author>
<author>
<name sortKey="Carver, T" uniqKey="Carver T">T Carver</name>
</author>
<author>
<name sortKey="Glover, K" uniqKey="Glover K">K Glover</name>
</author>
<author>
<name sortKey="Pocock, Mr" uniqKey="Pocock M">MR Pocock</name>
</author>
<author>
<name sortKey="Wipat, A" uniqKey="Wipat A">A Wipat</name>
</author>
<author>
<name sortKey="Li, P" uniqKey="Li P">P Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tang, F" uniqKey="Tang F">F Tang</name>
</author>
<author>
<name sortKey="Chua, C" uniqKey="Chua C">C Chua</name>
</author>
<author>
<name sortKey="Ho, L" uniqKey="Ho L">L Ho</name>
</author>
<author>
<name sortKey="Lim, Y" uniqKey="Lim Y">Y Lim</name>
</author>
<author>
<name sortKey="Issac, P" uniqKey="Issac P">P Issac</name>
</author>
<author>
<name sortKey="Krishnan, A" uniqKey="Krishnan A">A Krishnan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lua, Q" uniqKey="Lua Q">Q Lua</name>
</author>
<author>
<name sortKey="Haob, P" uniqKey="Haob P">P Haob</name>
</author>
<author>
<name sortKey="Curcinc, V" uniqKey="Curcinc V">V Curcinc</name>
</author>
<author>
<name sortKey="Heb, W" uniqKey="Heb W">W Heb</name>
</author>
<author>
<name sortKey="Lib, Y Y" uniqKey="Lib Y">Y-Y Lib</name>
</author>
<author>
<name sortKey="Luoa, Q M" uniqKey="Luoa Q">Q-M Luoa</name>
</author>
<author>
<name sortKey="Guoc, Y K" uniqKey="Guoc Y">Y-K Guoc</name>
</author>
<author>
<name sortKey="Lib, Y X" uniqKey="Lib Y">Y-X Lib</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
<author>
<name sortKey="Lin, F" uniqKey="Lin F">F Lin</name>
</author>
<author>
<name sortKey="Laud, A" uniqKey="Laud A">A Laud</name>
</author>
<author>
<name sortKey="Santoso, Y" uniqKey="Santoso Y">Y Santoso</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Romano, P" uniqKey="Romano P">P Romano</name>
</author>
<author>
<name sortKey="Bartocci, E" uniqKey="Bartocci E">E Bartocci</name>
</author>
<author>
<name sortKey="Bertolini, G" uniqKey="Bertolini G">G Bertolini</name>
</author>
<author>
<name sortKey="De Paoli, F" uniqKey="De Paoli F">F De Paoli</name>
</author>
<author>
<name sortKey="Marra, D" uniqKey="Marra D">D Marra</name>
</author>
<author>
<name sortKey="Mauri, G" uniqKey="Mauri G">G Mauri</name>
</author>
<author>
<name sortKey="Merelli, E" uniqKey="Merelli E">E Merelli</name>
</author>
<author>
<name sortKey="Milanesi, L" uniqKey="Milanesi L">L Milanesi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bartocci, E" uniqKey="Bartocci E">E Bartocci</name>
</author>
<author>
<name sortKey="Corradini, F" uniqKey="Corradini F">F Corradini</name>
</author>
<author>
<name sortKey="Merelli, E" uniqKey="Merelli E">E Merelli</name>
</author>
<author>
<name sortKey="Scortichini, L" uniqKey="Scortichini L">L Scortichini</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Singh, Dt" uniqKey="Singh D">DT Singh</name>
</author>
<author>
<name sortKey="Trehan, R" uniqKey="Trehan R">R Trehan</name>
</author>
<author>
<name sortKey="Ray, P" uniqKey="Ray P">P Ray</name>
</author>
<author>
<name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="case-report">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">18315855</article-id>
<article-id pub-id-type="pmc">2259424</article-id>
<article-id pub-id-type="publisher-id">2567</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-9-S1-S23</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Proceedings</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Comparative phyloinformatics of virus genes at micro and macro levels in a distributed computing environment</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Singh</surname>
<given-names>Dadabhai T</given-names>
</name>
<address>
<email>dtsingh@genvea.com</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Trehan</surname>
<given-names>Rahul</given-names>
</name>
<address>
<email>rahul@pmail.ntu.edu.sg</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Schmidt</surname>
<given-names>Bertil</given-names>
</name>
<address>
<email>bertil.schmidt@computer.org</email>
</address>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bretschneider</surname>
<given-names>Timo</given-names>
</name>
<address>
<email>astimo@ntu.edu.sg</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
Genvea Biosciences, 53 Craig Road, #04-01, 089691 Singapore</aff>
<aff id="Aff2">
<label>2</label>
<institution-wrap>
<institution-id institution-id-type="GRID">grid.59025.3b</institution-id>
<institution-id institution-id-type="ISNI">0000000122240361</institution-id>
<institution>Nanyang Technological University,</institution>
</institution-wrap>
Nanyang Avenue N4-02a-32, 639798 Singapore</aff>
<aff id="Aff3">
<label>3</label>
<institution-wrap>
<institution-id institution-id-type="GRID">grid.1008.9</institution-id>
<institution-id institution-id-type="ISNI">000000012179088X</institution-id>
<institution>NICTA VRL,</institution>
<institution>University of Melbourne,</institution>
</institution-wrap>
Parkville, 3010 Australia</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>13</day>
<month>2</month>
<year>2008</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>13</day>
<month>2</month>
<year>2008</year>
</pub-date>
<pub-date pub-type="collection">
<year>2008</year>
</pub-date>
<volume>9</volume>
<issue>Suppl 1</issue>
<elocation-id>S23</elocation-id>
<permissions>
<copyright-statement>© Singh et al; licensee BioMed Central Ltd. 2008</copyright-statement>
<license>
<license-p>This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>Preparedness for a possible global pandemic caused by viruses such as the highly pathogenic influenza A subtype H5N1 has become a global priority. In particular, it is critical to monitor the appearance of any new emerging subtypes. Comparative phyloinformatics can be used to monitor, analyze, and possibly predict the evolution of viruses. However, in order to utilize the full functionality of available analysis packages for large-scale phyloinformatics studies, a team of computer scientists, biostatisticians and virologists is needed – a requirement which cannot be fulfilled in many cases. Furthermore, the time complexities of many algorithms involved leads to prohibitive runtimes on sequential computer platforms. This has so far hindered the use of comparative phyloinformatics as a commonly applied tool in this area.</p>
</sec>
<sec>
<title>Results</title>
<p>In this paper the graphical-oriented workflow design system called
<italic>Quascade</italic>
and its efficient usage for comparative phyloinformatics are presented. In particular, we focus on how this task can be effectively performed in a distributed computing environment. As a proof of concept, the designed workflows are used for the phylogenetic analysis of neuraminidase of H5N1 isolates (micro level) and influenza viruses (macro level). The results of this paper are hence twofold. Firstly, this paper demonstrates the usefulness of a graphical user interface system to design and execute complex distributed workflows for large-scale phyloinformatics studies of virus genes. Secondly, the analysis of neuraminidase on different levels of complexity provides valuable insights of this virus's tendency for geographical based clustering in the phylogenetic tree and also shows the importance of glycan sites in its molecular evolution.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>The current study demonstrates the efficiency and utility of workflow systems providing a biologist friendly approach to complex biological dataset analysis using high performance computing. In particular, the utility of the platform Quascade for deploying distributed and parallelized versions of a variety of computationally intensive phylogenetic algorithms has been shown. Secondly, the analysis of the utilized H5N1 neuraminidase datasets at macro and micro levels has clearly indicated a pattern of spatial clustering of the H5N1 viral isolates based on geographical distribution rather than temporal or host range based clustering.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/1471-2105-9-S1-S23) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Influenza</kwd>
<kwd>Influenza Virus</kwd>
<kwd>Sialic Acid</kwd>
<kwd>Severe Acute Respiratory Syndrome</kwd>
<kwd>Severe Acute Respiratory Syndrome</kwd>
</kwd-group>
<conference xlink:href="http://incob.apbionet.org/">
<conf-name>Sixth International Conference on Bioinformatics (InCoB2007)</conf-name>
<conf-loc>Hong Kong</conf-loc>
<conf-date>27–30 August 2007</conf-date>
</conference>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2008</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Background</title>
<p>Recent occurrences of pandemics like the
<italic>Severe Acute Respiratory Syndrome</italic>
(SARS) or
<italic>Avian Influenza</italic>
clearly underscore the threat and seriousness of global diseases. The steadily growing globalization makes it difficult to contain pandemics to a certain region. Therefore, pandemic control is of highest importance to human health. Unfortunately, the segmented nature of the genome of influenza viruses is very conducive for genetic shifts and their rapid spread across various genera augments genetic drift. For example, the human influenza pandemics in 1957 and 1968 were suggested to have been caused by re-assorted influenza viruses [
<xref ref-type="bibr" rid="CR1">1</xref>
]. Moreover, the H5N1 outbreak in Hong Kong in 1997 has convincingly demonstrated the ability of an avian virus to make the transition from birds to humans directly without going through a perceived "permissible host". In particular, the H5N1 virus is believed to have acquired the
<italic>hemagglutinin</italic>
(HA) gene from A/goose/Guangdong/1/96 H5N1 and A/teal/Hong Kong/W312/97 H6N1, while the internal genes were received from A/quail/Hong Kong/G1/97 H2N2 or A/teal/Hong Kong/W312/97 H6N1, respectively [
<xref ref-type="bibr" rid="CR2">2</xref>
], [
<xref ref-type="bibr" rid="CR3">3</xref>
]. Even though this particular strain was eliminated by culling millions of chicken, its ancestors remains circulating in aquatic birds.</p>
<p>This paper proposes a new approach to pandemic control by constantly monitoring molecular evolution at both macro level (within the group of viruses) and micro level (within the group of strains) using comparative phyloinformatics. This can facilitate prediction of how these viruses are evolving in terms of spatial, temporal, and host dimensions, and therefore, allows for faster responses to new outbreaks as well as their diagnosis. However, corresponding phylogenetic tree construction algorithms suffer from long runtimes due to their high degrees of computational complexity as well as the large datasets involved. Therefore, it is necessary to develop informatics based solutions that use suitable algorithms and take advantage of distributed computing technologies to make such studies feasible in a reasonable amount of time. Furthermore, these solutions need to be integrated in a framework for pandemic control that is biologist-friendly. As a result, effective vaccines and antivirals can be designed more easily and within a shorter response time, since specific local strains can be targeted directly.</p>
<p>In this paper it is demonstrated how such a system can be developed using a new distributed workflow design system called
<italic>Quascade</italic>
. As a proof of concept, results for an actual systematic phyloinformatics analysis of
<italic>neuraminidase</italic>
(NA) genes of H5N1 isolates and influenza viruses are presented. Various phylogenetic algorithms; i.e. UPGMA, Neighbor-Joining, Maximum Parsimony, Maximum Likelihood and Mr. Bayes [
<xref ref-type="bibr" rid="CR4">4</xref>
<xref ref-type="bibr" rid="CR7">7</xref>
], are compared with respect to their efficiency and accuracy in deriving biologically meaningful results. In particular, the presented study illustrates how these algorithms can be integrated in a unique user-friendly workflow in order to enhance the efficiency of a comparative phyloinformatic analysis.</p>
<p>The selection of the receptor destroyer NA for this study is motivated by its following properties. NA belongs to the glycosyl hydrolase family of proteins and catalyzes the cleavage of sialic acid residues from viral and cellular glycoproteins. Its most important function is to remove the terminal sialic acids from HA in order to enable the virus budding. Moreover, it helps the virus spreading through the system by additionally removing sialic acids from cell surfaces. In particular, since variations in the NA gene can lead to the evolution of more potent strains, studying the molecular evolution of this gene using the latest algorithms and computing power is essential.</p>
<p>Several isolated studies have been conducted since the H5N1 outbreak in late 2003 to monitor the molecular evolution at gene and genome level [
<xref ref-type="bibr" rid="CR8">8</xref>
], [
<xref ref-type="bibr" rid="CR9">9</xref>
]. All these studies involve a phylogenetic analysis at one level or the other. However, no systematic efforts have been undertaken so far to evaluate the suitability of the used phylogenetic algorithms in yielding biologically meaningful results for the analysis of HA and NA gene products of H5N1 viruses in particular and influenza viruses in general.</p>
</sec>
<sec id="Sec2">
<title>Results</title>
<p>We have used the new Quascade system to implement distributed workflows for comparative phyloinformatics using different algorithms. Figures
<xref rid="Fig1" ref-type="fig">1</xref>
,
<xref rid="Fig2" ref-type="fig">2</xref>
, and
<xref rid="Fig3" ref-type="fig">3</xref>
show these workflows for distance-based algorithms, maximum parsimony algorithms and maximum likelihood algorithms, respectively. The workflows use ClustalW [
<xref ref-type="bibr" rid="CR10">10</xref>
] to compute multiple sequence alignments and different phylogenetic algorithms from the PHYLIP package [
<xref ref-type="bibr" rid="CR4">4</xref>
].
<fig id="Fig1">
<label>Figure 1</label>
<caption>
<p>
<bold>Distributed Distance Workflow</bold>
. Distributed distance based workflow.</p>
</caption>
<graphic xlink:href="12859_2008_Article_2567_Fig1_HTML" id="d29e371"></graphic>
</fig>
<fig id="Fig2">
<label>Figure 2</label>
<caption>
<p>
<bold>Distributed Parsimony Workflow</bold>
. Distributed parsimony based workflow.</p>
</caption>
<graphic xlink:href="12859_2008_Article_2567_Fig2_HTML" id="d29e382"></graphic>
</fig>
<fig id="Fig3">
<label>Figure 3</label>
<caption>
<p>
<bold>Distributed ML Workflow</bold>
. Distributed ML based workflow.</p>
</caption>
<graphic xlink:href="12859_2008_Article_2567_Fig3_HTML" id="d29e393"></graphic>
</fig>
</p>
<p>The most compute-intensive parts of each workflow (i.e. ProtDist in Figure
<xref rid="Fig1" ref-type="fig">1</xref>
, ProtPars in Figure
<xref rid="Fig2" ref-type="fig">2</xref>
, and ProtML in Figure
<xref rid="Fig3" ref-type="fig">3</xref>
) are executed in a distributed computing environment by simply multiplexing the data and using several instances of the respective programs. Figures
<xref rid="Fig4" ref-type="fig">4</xref>
and
<xref rid="Fig5" ref-type="fig">5</xref>
show execution times of the workflows for varying numbers of processors and sequences. In summary it can be seen that the system can achieve linear speedups.
<fig id="Fig4">
<label>Figure 4</label>
<caption>
<p>
<bold>Workflow Scalability</bold>
. Execution time vs. number of processors for 42 sequences and 96 data sets for distance and parsimony based workflows.</p>
</caption>
<graphic xlink:href="12859_2008_Article_2567_Fig4_HTML" id="d29e422"></graphic>
</fig>
<fig id="Fig5">
<label>Figure 5</label>
<caption>
<p>
<bold>Workflow Scalability for large data sets</bold>
. Execution time of the NA bird flu protein data sets consisting of 909 and 581 sequences respectively using 1 processor and 25 processors for (a) the distance-based workflow and (b) the parsimony workflow.</p>
</caption>
<graphic xlink:href="12859_2008_Article_2567_Fig5_HTML" id="d29e433"></graphic>
</fig>
</p>
<p>We have used the designed workflows for the phyloinformatics analysis of NA in different populations of H5N1 in particular and in influenza viruses in general. The reason for choosing a protein-based phyloinformatics approach instead of a gene-based approach is to gain a better understanding of the molecular evolution of the gene product that makes this deadly virus spreading across different hosts. To perform this study, three protein data sets have been collected as follows.
<list list-type="order">
<list-item>
<p>A manual search of Swissprot [
<xref ref-type="bibr" rid="CR11">11</xref>
] for NA and H5N1 has revealed only four entries: Q9WAA1 (A/Chicken/Hong Kong/220/1997 H5N1), Q710U6 (A/Chicken/Scotland/1959 H5N1), Q9Q0U7 (A/Goose/Guangdong/1/1996 H5N1), and Q9W7Y7 (A/Hong Kong/156/1997 H5N1).</p>
</list-item>
<list-item>
<p>A subsequent combined and refined search of Swissprot with TrEMBL (Translated European Molecular Biology Laboratory) has resulted in 18 more protein sequences. The resulting group of 22 NA sequences is used as the
<italic>core group</italic>
.</p>
</list-item>
<list-item>
<p>A
<italic>medium sized dataset</italic>
has been obtained comprising 581 entries pertaining to H5N1 NA that were mined from the Uniprot database [
<xref ref-type="bibr" rid="CR11">11</xref>
].</p>
</list-item>
<list-item>
<p>A
<italic>macro dataset</italic>
of 909 entries of NA from all Influenza A viruses has been obtained from Uniprot.</p>
</list-item>
</list>
</p>
<p>These three datasets are shown in the additional files
<xref rid="MOESM1" ref-type="media">1</xref>
,
<xref rid="MOESM2" ref-type="media">2</xref>
, and
<xref rid="MOESM3" ref-type="media">3</xref>
.</p>
<p>Phylograms obtained for the core group from the character based algorithms ProtPars [
<xref ref-type="bibr" rid="CR4">4</xref>
] and ProtML [
<xref ref-type="bibr" rid="CR4">4</xref>
] are shown in the additional files
<xref rid="MOESM4" ref-type="media">4</xref>
and
<xref rid="MOESM5" ref-type="media">5</xref>
, while phylograms obtained from the two distance-based algorithms are shown in the additional files
<xref rid="MOESM6" ref-type="media">6</xref>
and
<xref rid="MOESM7" ref-type="media">7</xref>
. The phylogram obtained by Mr. Bayes [
<xref ref-type="bibr" rid="CR7">7</xref>
] is displayed in Figure
<xref rid="Fig6" ref-type="fig">6</xref>
. P18269Sial, a trypanosoan sialdase and Q05JH9H9N2, a NA of the distantly related H9N2 virus have been used as members of an outlier group for the phyloinformatics analysis.
<fig id="Fig6">
<label>Figure 6</label>
<caption>
<p>
<bold>Phylogram obtained from Mr. Bayes</bold>
. Phylograms obtained from Mr. Bayes for the dataset H5N1_NA_24.txt.</p>
</caption>
<graphic xlink:href="12859_2008_Article_2567_Fig6_HTML" id="d29e524"></graphic>
</fig>
</p>
<p>Our phyloinformatics analysis with the core set has revealed a clear pattern: spatial clustering of the strains based on the particular geographical region rather than temporal clustering based on time scale or according to host range. The obtained trees for the medium-size data set are shown in the additional files
<xref rid="MOESM8" ref-type="media">8</xref>
,
<xref rid="MOESM9" ref-type="media">9</xref>
, and
<xref rid="MOESM10" ref-type="media">10</xref>
. The algorithms ProtPars, NJ, and UPGMA were used. All these algorithms have been distributed using the above Quascade workflows and deployed on a cluster of PCs. The utilized cluster consists of 16 nodes comprising 32 CPUs. Its detailed specification and architecture are shown in Table
<xref rid="Tab1" ref-type="table">1</xref>
and Figure
<xref rid="Fig7" ref-type="fig">7</xref>
, respectively. However, the algorithms ML and Mr. Bayes could not be run on the existing system with the medium dataset since it became apparent that these algorithms require further optimization in terms of distribution and/or more processing resources.
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>Specification of PCs in the cluster.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center">OS Platform</th>
<th align="center">CPU Type</th>
<th align="center">#CPU's</th>
<th align="center">RAM</th>
<th align="center">#PCs</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">Rock 3.2.0 (2.4.21-15 ELsmp)</td>
<td align="center">Intel Xeon 3.06 GHz</td>
<td align="center">2</td>
<td align="center">1 GB</td>
<td align="center">16</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="Fig7">
<label>Figure 7</label>
<caption>
<p>
<bold>PC cluster architecture</bold>
. The architecture of the utilized PC cluster.</p>
</caption>
<graphic xlink:href="12859_2008_Article_2567_Fig7_HTML" id="d29e600"></graphic>
</fig>
</p>
<p>Finally, in order to provide a global perspective of the molecular evolution of H5N1_NA, the dataset of 909 sequences of Influenza A viruses rather than H5N1 alone has been utilized. The objective of this macro level analysis is to signify the fact that this approach may lead to the nearest neighbors of some of the lethal clades which may be kept under observation for further evolution. Again, the large dataset was analyzed using the three algorithms ProtPars, NJ and UPGMA. The results obtained are shown in the additional files
<xref rid="MOESM11" ref-type="media">11</xref>
,
<xref rid="MOESM12" ref-type="media">12</xref>
, and
<xref rid="MOESM13" ref-type="media">13</xref>
. As per the core and medium sized datasets, the macro dataset also conforms to the clustering of H5N1_NA along the spatial lines rather than temporal or host range lines.</p>
<p>Keeping the Thai clade isolates in focus while analyzing the large dataset, the results obtained with ProtPars agree with the trend obtained so far. All twelve core sequences are mapped to the Thai clade. Besides these twelve, ten more sequences were found in this clade, whereby eight of them are fragments. The remaining two are actually complete sequences from Thailand (chicken isolates). However, they were annotated as
<italic>Neuramindase</italic>
instead of Neuraminidase [
<xref ref-type="bibr" rid="CR12">12</xref>
], [
<xref ref-type="bibr" rid="CR13">13</xref>
] which explains why these two entries are missing in the core set. However, querying of Uniprot with "Influenza virus" and "NA" as query terms has returned these two entries which are clearly captured and validated in our phyloinformatics analysis. Zooming out of the Thai clade has shown the Vietnam clade as its nearest neighbor, followed further by the Indian clade with Vietnam strains as their neighbors. On the next higher level Indonesian, Malaysian and Chinese clades are indicated as neighbors. It would be interesting to further study the Indian clade and verify how closely the Indian strains and Vietnamese strains are related with respect to other genes and also at genome level. Such an analysis may lead to the identification of the origin for Bird Flu in India. Further analysis of the large dataset indicates H9N2 as the nearest clade to H5N1 and non-structural protein as nearest to NA within the H5N1_NA group. The overall tree includes isolates from H3N2 to H15N9. A detailed analysis of each clade and sub-clade within this tree and with reference to the other two trees could lead to an understanding of H5N1's molecular evolution at global level.</p>
</sec>
<sec id="Sec3">
<title>Discussion</title>
<p>The core set includes samples isolated over a time frame from 1959 to 2006 and represents globally distributed localities from Thailand to Bavaria, Germany. All the phylograms have essentially yielded geographic based clustering rather than time based clustering. The seven distinct clades that were clearly demarcated are: Thailand clade I (Q2L700; Q2LDC8; Q2LDC0; Q6PUP7; Q307U7; Q45ZM8), Thailand clade II (Q5SDA6; Q307V5; Q6PUP6; Q5MD56), Thailand clade III (Q6B518; Q4PKD4), Indian clade (Q0PEF9; Q0PEG0) Scotland clade (Q0A2H3; Q710U6) Bavarian clade (A1EHP1; A1EHP3), and China clade (Q6DTU0; Q9WAA1; Q9Q0U7; Q9W7Y7). It is interesting to note that the host range across these clades ranges from Chicken to Human consisting of 16 avian hosts and six mammalian hosts.</p>
<p>The clustering obtained using various algorithms has not shown any bias towards host based clustering. In fact, Thailand clade II consisted of a chicken, a cat, a tiger and a human. Multiple sequence alignments using ClustalW and rendered consecutively in BioEdit [
<xref ref-type="bibr" rid="CR24">24</xref>
] have revealed the interesting and intriguing fact that the sequences of H5N1_NA from the same strain, which have infected all these hosts, are identical (see theadditional file
<xref rid="MOESM14" ref-type="media">14</xref>
). This observation indicates that even though NA is critical in spreading the virus to different hosts by means of its receptor destroying capacity, it may not be the sole factor for deciding the divergent host range. Accordingly a study was undertaken to monitor the phylogenetics of another critical protein, HA, and will be reported in a separate communication.</p>
<p>Multiple alignments of all 22 sequences have revealed another interesting fact. As can be seen from additional file
<xref rid="MOESM15" ref-type="media">15</xref>
, the earliest (in terms of the time scale) isolates that were considered belong to the Scotland clade which includes two isolates from chicken isolated in 1959. Both of them share an identical sequence homology. However, the even more important point to note here is the existence of a 20 amino acid stretch from position 48 to 68 in these clades which gets deleted in strains that evolved temporally. This stretch of twenty amino acids is absent in the isolates isolated from regions outside China and Hong Kong from 1996 onward. Another important observation to note is that the Goosander isolate from Guangdong province of China isolated in 1997 (Q9Q0U7/97) retained the same stretch of 20 amino acids as the Scotland clades but with more than 20 point mutations spread across the entire length. The deleted amino acid stretch included three N-glycosylation sites: NQSI, NNTW, and NQTY. The Chinese isolate (Q9Q0U7/97) retained all the glycosylation sites, whereas the two Hong Kong isolates have retained only the NQSI site but lost the other two. All the remaining 17 isolates have lost all the three glycosylation sites. Poon et al. [
<xref ref-type="bibr" rid="CR14">14</xref>
] have shown the criticality of acquiring or losing glycan sites for effective viral spread in a study involving the phylogenetics of glycan interactions of the HIV envelop protein. A similar study is being undertaken by the authors of this work to evaluate the role of "lost" glycons in H5N1_NA.</p>
<p>A quick perusal of the results obtained with the three algorithms (ProtPars, NJ and UPGMA) has confirmed the pattern obtained from the core set. The strains of H5N1_NA cluster spatially rather than temporally or according to host. However, there are subtle differences amongst the outputs of the three algorithms with respect to their resolution. The clustering obtained using NJ seems to be better resolved than the other two in terms of branch length and sub-speciation. A detailed analysis of the Thai clade obtained by ProtPars also has revealed that all the sequences from the core set have been represented in this tree as well. The additional isolates are mostly fragments obtained through PCR amplification and uploaded to Uniprot. An interesting aspect of this clade is the finding that the entry Q5EP24 (Chicken isolate from Vietnam) is placed almost as an outlier in this group. It would be interesting to analyze the genealogies of Vietnam and Thailand clades and to verify whether there are any "bridging" isolates such as Q5EP24 that may play an important role in the spread of this deadly disease across the globe. We have analyzed our medium dataset to confirm the pattern we obtained with the core set. A detailed analysis of each geographic clade with respect to the "bridging" isolates such as Q5EP24 may reveal a global pattern of H5N1 spread.</p>
</sec>
<sec id="Sec4">
<title>Conclusion</title>
<p>Global preparedness for H5N1 pandemic has been declared as the top priority of global health agencies such as CDC. In view of the ever escalating threat of the virus mutating to a human lethal form, it has become essential to constantly monitor the molecular evolution of this virus utilizing the latest and best phylogenetic algorithms.</p>
<p>The current study was undertaken with two objectives: One is to demonstrate the utility of a biologist's friendly high performance computing workflow system in analyzing large and complex biological datasets by deploying compute intensive algorithms in their parallelized version. The second objective is to analyze H5N1_NA data, as a proof of concept, in such a workflow to understand the molecular evolution of this rapidly evolving virus.</p>
<p>We have demonstrated the utility of workflow systems by designing a pipeline that starts with data input by the biologist. Once the relevant data is uploaded into the workflow the remaining steps are all automated. In particular these steps comprise starting with multiple alignments of sequences to the distribution of the output to various algorithms, distributing the relevant outputs to different servers in a distributed environment which is grid compatible, and finally visualizing the output. Such a workflow system saves a significant amount of time and eliminates possible human errors in analyzing critical data.</p>
<p>In any phylogenetic analysis involving pathogenic viruses there are three clear possibilities of clustering behavior: spatial, temporal and host based. Our analysis of H5N1 NA data utilizing different phylogenetic algorithms have indicated a spatial clustering of this virus based on geographical distribution rather than temporal or host. Of course, single gene product analysis is insufficient to arrive at any biologically relevant conclusion and, hence, we propose to do multiple gene and genome level analysis as our future work. However, even for a single gene analysis the computing power required is formidable and we have used this approach as a proof of the concept that high performance computing is a must for any meaningful phyloinformatics study. Even with the limited amount of data, we have been able to find a clear pattern (geographical clustering) and the importance of glycon sites. However, further detailed studies in conjunction with other proteins such as HA, Polymerase etc., and also at gene and genome level, are required to draw firm conclusions. It is also true that interpretation becomes more difficult if the size of the trees becomes very large. In our experience, zooming onto a cluster of interest (e.g. Thailand and China clades) with relative ease and quickness, using the Quascade middleware is an attractive feature of our study. Furthermore, the comparison of different algorithms simultaneously with the same input file is another attractive attribute.</p>
</sec>
<sec id="Sec5">
<title>Methods</title>
<sec id="Sec6">
<title>Overview of Quascade-MP2</title>
<p>The exponential growth in the size of biological databases has established the need for high performance computing (HPC) in bioinformatics. Typically, an HPC setup operates in a cluster computing environment consisting of multiple computers that communicate over fast switches. For popular database scanning applications such as BLAST and HMMER the benefits of clusters are immediate and linear speedups can be easily achieved. However, the evolving challenges in life sciences cannot be all addressed by off-the-shelf bioinformatics applications. Life scientists need to analyze their data using novel approaches published in recent journals or based on their own hypotheses and assumptions. Quascade-MP2 has been developed to address this need. It is a visual prototyping tool created especially for data-driven, high performance scientific applications.</p>
<p>The complexity involved in using traditional technologies and tools often proves to be over-whelming and counter-productive. Most of these tools require an understanding of programming or scripting languages, such as Perl, Python, Java, and UNIX-scripts. Recent examples of such systems in bioinformatics include Biopipe [
<xref ref-type="bibr" rid="CR15">15</xref>
], BioWBI [
<xref ref-type="bibr" rid="CR16">16</xref>
], Taverna [
<xref ref-type="bibr" rid="CR17">17</xref>
], Wildfire[
<xref ref-type="bibr" rid="CR18">18</xref>
], KDE Biosciences [
<xref ref-type="bibr" rid="CR19">19</xref>
], gRNA [
<xref ref-type="bibr" rid="CR20">20</xref>
], Biowep [
<xref ref-type="bibr" rid="CR21">21</xref>
], and BioWMS [
<xref ref-type="bibr" rid="CR22">22</xref>
]. Each deployment is therefore subject to its own code development and testing. Although programming languages offer sophisticated control over the intended procedure, the learning curve and overheads in terms of human resources are difficult to justify. Quascade has been developed with this problem in mind.</p>
<p>Quascade has been designed for research applications characterized by strict high performance requirements. It provides a graphical, drag-and-drop interface to allow users to design and execute ad-hoc workflows. Workflows are constructed by an end-user by inter-connecting functional blocks, called
<italic>components</italic>
. A component is a piece of independent and self-contained code performing a given functionality on its input and generates an output. Data paths among components can be specified by drawing lines between/among the output(s) and input(s) of different components, respectively. Thereby, various combinations of these components (such as one output to multiple inputs, multiple outputs to multiple inputs) can be used to create any workflow depicting the flow of both data and logic.</p>
<p>Individual components in a workflow may be designated to run on different computing servers in a cluster by simply specifying a corresponding condition in Quascade. This provides a straightforward approach of constructing workflows that execute in a high-performance environment. In turn, a component running on a given server may use several computing nodes to execute its program, thereby providing a two-tiered mechanism of distributed processing. As an example, Figure
<xref rid="Fig8" ref-type="fig">8</xref>
shows a physical workflow and its graphical counterpart. The workflow consists of a
<italic>sample generator</italic>
component, which generates/collects data, an
<italic>analyzer</italic>
component which performs the actual processing, and a
<italic>sample sink</italic>
component which 'consumes' the processed data in terms of not forwarding the data further to another component. While the
<italic>generator</italic>
and
<italic>sink</italic>
components can be connected to external entities, i.e. physical devices, the
<italic>analyzer</italic>
component may be a simple single-node program or a complex, parallel multi-node application.
<fig id="Fig8">
<label>Figure 8</label>
<caption>
<p>
<bold>Quascade-MP2 workflow</bold>
. Typical configuration for executing a Quascade-MP2 workflow</p>
</caption>
<graphic xlink:href="12859_2008_Article_2567_Fig8_HTML" id="d29e731"></graphic>
</fig>
</p>
</sec>
<sec id="Sec7">
<title>Software/Hardware setup</title>
<p>A typical installation of Quascade with MP
<sup>2</sup>
middleware is installed on a centrally accessible network file system mount location, while a client uses Quascade to create and execute a workflow. Since each component in the workflow can be assigned to a particular MP
<sup>2</sup>
server, a straightforward implicit distribution can be achieved, where all communication among components is managed by Quascade and MP
<sup>2</sup>
. Complimentary, the workload distribution of an MP
<sup>2</sup>
server among its allocated MP
<sup>2</sup>
compute nodes is performed explicitly, i.e. it is the software developer's responsibility to parallelize a component in order to enable efficient simultaneous execution on multiple nodes.</p>
</sec>
<sec id="Sec8">
<title>Communication issues</title>
<p>The user creates a workflow by selecting workflow components from a predefined list of deployed components on the cluster. Each component is configured to run on a particular cluster server or automatically assigned to a server if no explicit configuration is desired. At run-time Quascade performs a remote invocation to the selected servers and creates local instances of the used components. Hence, client-server communication takes place between a Quascade client and one or more instances of MP
<sup>2</sup>
servers forming the cluster. Complementary, server-server communication takes place between two or more servers or between a server and its compute nodes.</p>
<p>Two different types of communication can be differentiated:
<italic>explicit</italic>
and
<italic>implicit</italic>
. They distinguish between the workload distribution by a parallel component to multiple compute nodes and the communication between two of more servers exchanging data, respectively. Server-server communication uses raw sockets to transfer data from one machine to the other. This solution provides a performance benefit over more sophisticated alternatives such as RMI or CORBA, but represents the lowest level of abstraction of the underlying network. As an answer to this trade-off, MP
<sup>2</sup>
overcomes the limitations of development complexity and inflexibility by providing an abstraction at the component level supported by a fixed number of network level operations. Thus, a developer has a high degree of flexibility regarding the operation of a component as long as the implementation adheres to the I/O scheme provided by the communication layer. Implicit communication refers to the communication that takes place between the output of one component and the input of another component, i.e. the selected data paths among components. The underlying transfer mechanism is provided by the middleware and comprises input buffers in order to enable asynchronous communication. Input buffers are continuously polled by the corresponding component and processing continued once data is available. An adjustable buffer length according to load expectation helps to prevent overflows.</p>
<p>For example our workflow component for ClustalW shows all the options that the original ClustalW program presents to users. The options are presented in a parameter output panel and translated into command line string for calling the underlying ClustalW program upon execution of the workflow. After the completion of ClustalW, an output file is produced on the local hard disk at a specified location containing the aligned data set. This file name and its location is the input for the next component (e.g. Seqboot). The remaining components, e.g. Seqboot with its underlying PHYLIP operate in the same way.</p>
<p>For instance, Figure
<xref rid="Fig9" ref-type="fig">9</xref>
shows a screenshot of the implemented sequential workflow for a distance based phylogenetic analysis.
<fig id="Fig9">
<label>Figure 9</label>
<caption>
<p>
<bold>Distance based Quascade workflow</bold>
. Screenshot of sequential distance based phylogenetic analysis workflow in Quascade.</p>
</caption>
<graphic xlink:href="12859_2008_Article_2567_Fig9_HTML" id="d29e788"></graphic>
</fig>
</p>
<p>The runtime of this workflow has been profiled on a single server for an input of 42 sequences and
<italic>N</italic>
= 100 replica. As can be seen form Figure
<xref rid="Fig10" ref-type="fig">10</xref>
, the
<italic>Protdist</italic>
component clearly dominates the overall execution time and requires parallelization.
<fig id="Fig10">
<label>Figure 10</label>
<caption>
<p>
<bold>Workflow Profiling</bold>
. Execution time of the distance based phylogenetic analysis workflow.</p>
</caption>
<graphic xlink:href="12859_2008_Article_2567_Fig10_HTML" id="d29e810"></graphic>
</fig>
</p>
<p>The Seqboot component is modified to break up its output file's data into several data sets, which are written to a buffered output port. Distribution of files is then implemented by using a demultiplexer (
<italic>Demux</italic>
) component and a result combining module (
<italic>Concat</italic>
). More technical details about the file distribution and concatenation as well as other workflow parameters can be found elsewhere [
<xref ref-type="bibr" rid="CR23">23</xref>
]. The model, design and implementation used to distribute the distance-based workflow can also be applied to parallelize the parsimony and the ML based workflows. The usage of the model and design developed for the Protdist component has therefore been used for the Protpars and the ProtML components. The corresponding workflows are shown in Figure
<xref rid="Fig1" ref-type="fig">1</xref>
,
<xref rid="Fig2" ref-type="fig">2</xref>
, and
<xref rid="Fig3" ref-type="fig">3</xref>
.</p>
</sec>
</sec>
<sec sec-type="supplementary-material">
<title>Electronic supplementary material</title>
<sec id="Sec9">
<p>
<supplementary-material content-type="local-data" id="MOESM1">
<media xlink:href="12859_2008_2567_MOESM1_ESM.txt">
<caption>
<p>Additional file 1: All sequence in the core data set. (TXT 12 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM2">
<media xlink:href="12859_2008_2567_MOESM2_ESM.txt">
<caption>
<p>Additional file 2: All sequence in the medium data set. (TXT 238 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM3">
<media xlink:href="12859_2008_2567_MOESM3_ESM.txt">
<caption>
<p>Additional file 3: All sequence in the macro data set. (TXT 481 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM4">
<media xlink:href="12859_2008_2567_MOESM4_ESM.pdf">
<caption>
<p>Additional file 4: Phylograms obtained from the Parsimony workflow for the dataset H5N1_NA_24.txt. (PDF 3 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM5">
<media xlink:href="12859_2008_2567_MOESM5_ESM.pdf">
<caption>
<p>Additional file 5: Phylograms obtained from the ML workflow for the dataset H5N1_NA_24.txt. (PDF 3 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM6">
<media xlink:href="12859_2008_2567_MOESM6_ESM.pdf">
<caption>
<p>Additional file 6: Phylograms obtained from the distance based workflow using UPGMA for the dataset H5N1_NA_24.txt. (PDF 3 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM7">
<media xlink:href="12859_2008_2567_MOESM7_ESM.pdf">
<caption>
<p>Additional file 7: Phylograms obtained from the distance workflow using NJ for the dataset H5N1_NA_24.txt. (PDF 3 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM8">
<media xlink:href="12859_2008_2567_MOESM8_ESM.pdf">
<caption>
<p>Additional file 8: Phylograms obtained from the distance workflow using UPGMA for the dataset H5N1_NA_medium.txt. (PDF 21 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM9">
<media xlink:href="12859_2008_2567_MOESM9_ESM.pdf">
<caption>
<p>Additional file 9: Phylograms obtained from the distance workflow using NJ for the dataset H5N1_NA_medium.txt. (PDF 21 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM10">
<media xlink:href="12859_2008_2567_MOESM10_ESM.pdf">
<caption>
<p>Additional file 10: Phylograms obtained from the parsimony workflow for the dataset H5N1_NA_medium.txt. (PDF 22 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM11">
<media xlink:href="12859_2008_2567_MOESM11_ESM.pdf">
<caption>
<p>Additional file 11: Phylograms obtained from the distance workflow using UPGMA for the dataset H5N1_NA_macro.txt. (PDF 32 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM12">
<media xlink:href="12859_2008_2567_MOESM12_ESM.pdf">
<caption>
<p>Additional file 12: Phylograms obtained from the distance workflow using NJ for the dataset H5N1_NA_macro.txt. (PDF 31 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM13">
<media xlink:href="12859_2008_2567_MOESM13_ESM.pdf">
<caption>
<p>Additional file 13: Phylograms obtained from the parsimony workflow for the dataset H5N1_NA_macro.txt. (PDF 33 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM14">
<media xlink:href="12859_2008_2567_MOESM14_ESM.pdf">
<caption>
<p>Additional file 14: Multiple Sequence Alignment of the Thailand clade II (Q5SDA6; Q307V5; Q6PUP6; Q5MD56) computed by ClustalW. (PDF 60 KB)</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="MOESM15">
<media xlink:href="12859_2008_2567_MOESM15_ESM.pdf">
<caption>
<p>Additional file 15: Multiple Sequence Alignment of all sequences in H5N1_NA_24.txt except the two outliers. (PDF 93 KB)</p>
</caption>
</media>
</supplementary-material>
</p>
</sec>
</sec>
</body>
<back>
<fn-group>
<fn>
<p>
<bold>Competing interests</bold>
</p>
<p>The authors declare that they have no competing interests.</p>
</fn>
<fn>
<p>
<bold>Authors' contributions</bold>
</p>
<p>DTS, BS conceived the study. RT, DTS, BS performed computational studies and designed workflows. RT implemented the workflow software. DTS collected the data. DTS, BS, and TB contributed analyzing experimental studies. DTS, BS, RT and TB wrote the manuscript.</p>
</fn>
</fn-group>
<ack>
<title>Acknowledgements</title>
<p>This article has been published as part of
<italic>BMC Bioinformatics</italic>
Volume 9 Supplement 1, 2008: Asia Pacific Bioinformatics Network (APBioNet) Sixth International Conference on Bioinformatics (InCoB2007). The full contents of the supplement are available online at
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2105/9?issue=S1">http://www.biomedcentral.com/1471-2105/9?issue=S1</ext-link>
.</p>
</ack>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zambon</surname>
<given-names>MC</given-names>
</name>
</person-group>
<article-title>The pathogenesis of influenza in humans</article-title>
<source>Rev Med Virol</source>
<year>2001</year>
<volume>11</volume>
<fpage>227</fpage>
<lpage>241</lpage>
<pub-id pub-id-type="doi">10.1002/rmv.319</pub-id>
<pub-id pub-id-type="pmid">11479929</pub-id>
</element-citation>
</ref>
<ref id="CR2">
<label>2.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Subbarao</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Shaw</surname>
<given-names>MW</given-names>
</name>
</person-group>
<article-title>Molecular aspects of avian influenza (H5N1) viruses isolated from humans</article-title>
<source>Rev Med Virol</source>
<year>2000</year>
<volume>10</volume>
<fpage>337</fpage>
<lpage>348</lpage>
<pub-id pub-id-type="doi">10.1002/1099-1654(200009/10)10:5<337::AID-RMV292>3.0.CO;2-V</pub-id>
<pub-id pub-id-type="pmid">11015744</pub-id>
</element-citation>
</ref>
<ref id="CR3">
<label>3.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hatta</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kawaoka</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>The continued pandemic threat posed by avian influenza viruses in Hong Kong</article-title>
<source>Trends Microbiol</source>
<year>2002</year>
<volume>10</volume>
<fpage>340</fpage>
<lpage>344</lpage>
<pub-id pub-id-type="doi">10.1016/S0966-842X(02)02388-0</pub-id>
<pub-id pub-id-type="pmid">12110213</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4.</label>
<mixed-citation publication-type="other">
<bold>PHYLIP Home Page</bold>
[
<ext-link ext-link-type="uri" xlink:href="http://evolution.genetics.washington.edu/phylip.html">http://evolution.genetics.washington.edu/phylip.html</ext-link>
]</mixed-citation>
</ref>
<ref id="CR5">
<label>5.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saitou</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Nei</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>The neighbor-joining method: A new method for reconstructing phylogenetic trees</article-title>
<source>Mol Biol Evol</source>
<year>1987</year>
<volume>4</volume>
<fpage>406</fpage>
<lpage>425</lpage>
<pub-id pub-id-type="pmid">3447015</pub-id>
</element-citation>
</ref>
<ref id="CR6">
<label>6.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Holder</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lewis</surname>
<given-names>PO</given-names>
</name>
</person-group>
<article-title>Phylogeny estimation: Traditional and Bayesian approaches</article-title>
<source>Nature Reviews Genetics</source>
<year>2003</year>
<volume>4</volume>
<fpage>275</fpage>
<lpage>284</lpage>
<pub-id pub-id-type="doi">10.1038/nrg1044</pub-id>
<pub-id pub-id-type="pmid">12671658</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<label>7.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ronquist</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Huelsenbeck</surname>
<given-names>JP</given-names>
</name>
</person-group>
<article-title>MrBayes 3: Bayesian phylogenetic inference under mixed models</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<fpage>1572</fpage>
<lpage>1574</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btg180</pub-id>
<pub-id pub-id-type="pmid">12912839</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<label>8.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Puthavathana</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Auewarakul</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Charoenying</surname>
<given-names>PC</given-names>
</name>
<name>
<surname>Sangsiriwut</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Pooruk</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Boonnak</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Khanyok</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Thawachsupa</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Kijphati</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Sawanpanyalert</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Molecular characterization of the complete genome of human influenza H5N1 virus isolates from Thailand</article-title>
<source>Journal of General Virology</source>
<year>2005</year>
<volume>86</volume>
<fpage>423</fpage>
<lpage>433</lpage>
<pub-id pub-id-type="doi">10.1099/vir.0.80368-0</pub-id>
<pub-id pub-id-type="pmid">15659762</pub-id>
</element-citation>
</ref>
<ref id="CR9">
<label>9.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tran</surname>
<given-names>TH</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Luong</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Pham</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>VC</given-names>
</name>
<name>
<surname>Pham</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Vo</surname>
<given-names>CD</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>TQM</given-names>
</name>
<name>
<surname>Ngo</surname>
<given-names>TT</given-names>
</name>
<name>
<surname>Dao</surname>
<given-names>BK</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>PP</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>TT</given-names>
</name>
<name>
<surname>Hoang</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>VT</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>TG</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>DT</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>HN</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>TKT</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>HS</given-names>
</name>
<name>
<surname>Le</surname>
<given-names>VT</given-names>
</name>
<name>
<surname>Dolecek</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Tran</surname>
<given-names>TT</given-names>
</name>
<name>
<surname>de Jong</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Schultsz</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Lim</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Horby</surname>
<given-names>P</given-names>
</name>
<collab>World Health Organization International Avian Influenza Investigative Team</collab>
<name>
<surname>Farrar</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Avian influenza A (H5N1) in 10 patients in Vietnam</article-title>
<source>N Engl J Med</source>
<year>2004</year>
<volume>350</volume>
<fpage>1179</fpage>
<lpage>1188</lpage>
<pub-id pub-id-type="doi">10.1056/NEJMoa040419</pub-id>
<pub-id pub-id-type="pmid">14985470</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thompson</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Higgins</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>TJ</given-names>
</name>
</person-group>
<article-title>CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</article-title>
<source>Nucleic Acid Res</source>
<year>1994</year>
<volume>22</volume>
<fpage>4673</fpage>
<lpage>4680</lpage>
<pub-id pub-id-type="doi">10.1093/nar/22.22.4673</pub-id>
<pub-id pub-id-type="pmid">7984417</pub-id>
</element-citation>
</ref>
<ref id="CR11">
<label>11.</label>
<mixed-citation publication-type="other">
<bold>ExPASy – UniProt Knowledgebase</bold>
[
<ext-link ext-link-type="uri" xlink:href="http://expasy.org/sprot">http://expasy.org/sprot</ext-link>
]</mixed-citation>
</ref>
<ref id="CR12">
<label>12.</label>
<mixed-citation publication-type="other">
<bold>UniProtKB/TrEMBL entry Q6DPL8 [Q6DPL8_9INFA] Neuraminidase</bold>
[
<ext-link ext-link-type="uri" xlink:href="http://www.expasy.org/uniprot/Q6dpl8">http://www.expasy.org/uniprot/Q6dpl8</ext-link>
]</mixed-citation>
</ref>
<ref id="CR13">
<label>13.</label>
<mixed-citation publication-type="other">
<bold>UniProtKB/TrEMBL entry Q6DPM0 [Q6DPM0_9INFA] Neuraminidase</bold>
[
<ext-link ext-link-type="uri" xlink:href="http://www.expasy.org/uniprot/Q6DPM0">http://www.expasy.org/uniprot/Q6DPM0</ext-link>
]</mixed-citation>
</ref>
<ref id="CR14">
<label>14.</label>
<mixed-citation publication-type="other">Poon AFY, Lewis F, Kosakovsky SL, Pond S, Frost DW:
<bold>Evolutionary interactions between N-linked glycosylation sites in the HIV envelope.</bold>
<italic>PLoS Computational Biology</italic>
<bold>3</bold>
(1):e11.</mixed-citation>
</ref>
<ref id="CR15">
<label>15.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoon</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ratnapu</surname>
<given-names>KK</given-names>
</name>
<name>
<surname>Chia</surname>
<given-names>J-M</given-names>
</name>
<name>
<surname>Kumarasamy</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Clamp</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Stabenau</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Potter</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Clarke</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Stupka</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Biopipe: A flexible framework for protocol-based bioinformatics analysis</article-title>
<source>Genome Research</source>
<year>2003</year>
<volume>13</volume>
<fpage>1904</fpage>
<lpage>1915</lpage>
<pub-id pub-id-type="pmid">12869579</pub-id>
</element-citation>
</ref>
<ref id="CR16">
<label>16.</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Leo</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Marinelli</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Pappadà</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Scioscia</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Zanchetta</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>BioWBI: An integrated tool for building and executing Bioinformatic analysis workflows</article-title>
<source>Proceedings of the Bioinformatics Italian Society Meeting</source>
<year>2004</year>
</element-citation>
</ref>
<ref id="CR17">
<label>17.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Oinn</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Addis</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ferris</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Marvin</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Senger</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Greenwood</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Carver</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Glover</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Pocock</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Wipat</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Taverna: A tool for the composition and enactment of bioinformatics workflows</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>20</volume>
<fpage>3045</fpage>
<lpage>3054</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bth361</pub-id>
<pub-id pub-id-type="pmid">15201187</pub-id>
</element-citation>
</ref>
<ref id="CR18">
<label>18.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Chua</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ho</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Lim</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Issac</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Krishnan</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Wildfire: Distributed, Grid-enabled workflow construction and execution</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>69</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-6-69</pub-id>
<pub-id pub-id-type="pmid">15788106</pub-id>
</element-citation>
</ref>
<ref id="CR19">
<label>19.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lua</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Haob</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Curcinc</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Heb</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Lib</surname>
<given-names>Y-Y</given-names>
</name>
<name>
<surname>Luoa</surname>
<given-names>Q-M</given-names>
</name>
<name>
<surname>Guoc</surname>
<given-names>Y-K</given-names>
</name>
<name>
<surname>Lib</surname>
<given-names>Y-X</given-names>
</name>
</person-group>
<article-title>KDE Biosciences: Platform for bioinformatics analysis workflows</article-title>
<source>Journal of Biomedical Informatics</source>
<year>2006</year>
<volume>39</volume>
<fpage>440</fpage>
<lpage>450</lpage>
<pub-id pub-id-type="doi">10.1016/j.jbi.2005.09.001</pub-id>
<pub-id pub-id-type="pmid">16260186</pub-id>
</element-citation>
</ref>
<ref id="CR20">
<label>20.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schmidt</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Laud</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Santoso</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Development of distributed bioinformatics applications with GMP</article-title>
<source>Concurrency and Computation: Practice and Experience</source>
<year>2004</year>
<volume>16</volume>
<fpage>945</fpage>
<lpage>959</lpage>
<pub-id pub-id-type="doi">10.1002/cpe.815</pub-id>
</element-citation>
</ref>
<ref id="CR21">
<label>21.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Romano</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Bartocci</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Bertolini</surname>
<given-names>G</given-names>
</name>
<name>
<surname>De Paoli</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Marra</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Mauri</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Merelli</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Milanesi</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Biowep: A workflow enactment portal for bioinformatics applications</article-title>
<source>BMC Bioinformatics</source>
<year>2007</year>
<volume>8</volume>
<issue>Suppl 1</issue>
<fpage>S19</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-S1-S19</pub-id>
<pub-id pub-id-type="pmid">17430563</pub-id>
</element-citation>
</ref>
<ref id="CR22">
<label>22.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bartocci</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Corradini</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Merelli</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Scortichini</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>BioWMS: A web-based workflow management system for bioinformatics</article-title>
<source>BMC Bioinformatics</source>
<year>2007</year>
<volume>8</volume>
<issue>Suppl 1</issue>
<fpage>S2</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-S1-S2</pub-id>
<pub-id pub-id-type="pmid">17430564</pub-id>
</element-citation>
</ref>
<ref id="CR23">
<label>23.</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Singh</surname>
<given-names>DT</given-names>
</name>
<name>
<surname>Trehan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ray</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schmidt</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Phylogenetic analysis of neuraminidase genes of H5N1 isolates using HPC technologies</article-title>
<source>Proceedings of the IEEE International Conference on e-Health Networking, Application and Services</source>
<year>2007</year>
<fpage>285</fpage>
<lpage>288</lpage>
</element-citation>
</ref>
<ref id="CR24">
<label>24.</label>
<mixed-citation publication-type="other">
<bold>BioEdit Sequence Alignment Editor for Windows 95/98/NT/XP</bold>
[
<ext-link ext-link-type="uri" xlink:href="http://www.mbio.ncsu.edu/BioEdit/bioedit.html">http://www.mbio.ncsu.edu/BioEdit/bioedit.html</ext-link>
]</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/H2N2V1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000110 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000110 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    H2N2V1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2259424
   |texte=   Comparative phyloinformatics of virus genes at micro and macro levels in a distributed computing environment
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:18315855" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a H2N2V1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 14 19:59:40 2020. Site generation: Thu Mar 25 15:38:26 2021