Serveur d'exploration H2N2

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

GiRaF: robust, computational identification of influenza reassortments via graph mining

Identifieur interne : 000A33 ( Pmc/Corpus ); précédent : 000A32; suivant : 000A34

GiRaF: robust, computational identification of influenza reassortments via graph mining

Auteurs : Niranjan Nagarajan ; Carl Kingsford

Source :

RBID : PMC:3064795

Abstract

Reassortments in the influenza virus—a process where strains exchange genetic segments—have been implicated in two out of three pandemics of the 20th century as well as the 2009 H1N1 outbreak. While advances in sequencing have led to an explosion in the number of whole-genome sequences that are available, an understanding of the rate and distribution of reassortments and their role in viral evolution is still lacking. An important factor in this is the paucity of automated tools for confident identification of reassortments from sequence data due to the challenges of analyzing large, uncertain viral phylogenies. We describe here a novel computational method, called GiRaF (Graph-incompatibility-based Reassortment Finder), that robustly identifies reassortments in a fully automated fashion while accounting for uncertainties in the inferred phylogenies. The algorithms behind GiRaF search large collections of Markov chain Monte Carlo (MCMC)-sampled trees for groups of incompatible splits using a fast biclique enumeration algorithm coupled with several statistical tests to identify sets of taxa with differential phylogenetic placement. GiRaF correctly finds known reassortments in human, avian, and swine influenza populations, including the evolutionary events that led to the recent ‘swine flu’ outbreak. GiRaF also identifies several previously unreported reassortments via whole-genome studies to catalog events in H5N1 and swine influenza isolates.


Url:
DOI: 10.1093/nar/gkq1232
PubMed: 21177643
PubMed Central: 3064795

Links to Exploration step

PMC:3064795

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">GiRaF: robust, computational identification of influenza reassortments via graph mining</title>
<author>
<name sortKey="Nagarajan, Niranjan" sort="Nagarajan, Niranjan" uniqKey="Nagarajan N" first="Niranjan" last="Nagarajan">Niranjan Nagarajan</name>
<affiliation>
<nlm:aff id="AFF1">Computational and Mathematical Biology, Genome Institute of Singapore, Singapore-127726,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="AFF1">Center for Bioinformatics and Computational Biology</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="AFF1">Department of Computer Science, University of Maryland, College Park, MD 20742, USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">21177643</idno>
<idno type="pmc">3064795</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3064795</idno>
<idno type="RBID">PMC:3064795</idno>
<idno type="doi">10.1093/nar/gkq1232</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000A33</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000A33</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">GiRaF: robust, computational identification of influenza reassortments via graph mining</title>
<author>
<name sortKey="Nagarajan, Niranjan" sort="Nagarajan, Niranjan" uniqKey="Nagarajan N" first="Niranjan" last="Nagarajan">Niranjan Nagarajan</name>
<affiliation>
<nlm:aff id="AFF1">Computational and Mathematical Biology, Genome Institute of Singapore, Singapore-127726,</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation>
<nlm:aff wicri:cut=" and" id="AFF1">Center for Bioinformatics and Computational Biology</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="AFF1">Department of Computer Science, University of Maryland, College Park, MD 20742, USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Nucleic Acids Research</title>
<idno type="ISSN">0305-1048</idno>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Reassortments in the influenza virus—a process where strains exchange genetic segments—have been implicated in two out of three pandemics of the 20th century as well as the 2009 H1N1 outbreak. While advances in sequencing have led to an explosion in the number of whole-genome sequences that are available, an understanding of the rate and distribution of reassortments and their role in viral evolution is still lacking. An important factor in this is the paucity of automated tools for confident identification of reassortments from sequence data due to the challenges of analyzing large, uncertain viral phylogenies. We describe here a novel computational method, called GiRaF (Graph-incompatibility-based Reassortment Finder), that robustly identifies reassortments in a fully automated fashion while accounting for uncertainties in the inferred phylogenies. The algorithms behind GiRaF search large collections of Markov chain Monte Carlo (MCMC)-sampled trees for groups of incompatible splits using a fast biclique enumeration algorithm coupled with several statistical tests to identify sets of taxa with differential phylogenetic placement. GiRaF correctly finds known reassortments in human, avian, and swine influenza populations, including the evolutionary events that led to the recent ‘swine flu’ outbreak. GiRaF also identifies several previously unreported reassortments via whole-genome studies to catalog events in H5N1 and swine influenza isolates.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Kawaoka, Y" uniqKey="Kawaoka Y">Y Kawaoka</name>
</author>
<author>
<name sortKey="Krauss, S" uniqKey="Krauss S">S Krauss</name>
</author>
<author>
<name sortKey="Webster, Rg" uniqKey="Webster R">RG Webster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dawood, Fs" uniqKey="Dawood F">FS Dawood</name>
</author>
<author>
<name sortKey="Jain, S" uniqKey="Jain S">S Jain</name>
</author>
<author>
<name sortKey="Finelli, L" uniqKey="Finelli L">L Finelli</name>
</author>
<author>
<name sortKey="Shaw, Mw" uniqKey="Shaw M">MW Shaw</name>
</author>
<author>
<name sortKey="Lindstrom, S" uniqKey="Lindstrom S">S Lindstrom</name>
</author>
<author>
<name sortKey="Garten, Rj" uniqKey="Garten R">RJ Garten</name>
</author>
<author>
<name sortKey="Gubareva, Lv" uniqKey="Gubareva L">LV Gubareva</name>
</author>
<author>
<name sortKey="Xu, X" uniqKey="Xu X">X Xu</name>
</author>
<author>
<name sortKey="Bridges, Cb" uniqKey="Bridges C">CB Bridges</name>
</author>
<author>
<name sortKey="Uyeki, Tm" uniqKey="Uyeki T">TM Uyeki</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghedin, E" uniqKey="Ghedin E">E Ghedin</name>
</author>
<author>
<name sortKey="Sengamalay, Na" uniqKey="Sengamalay N">NA Sengamalay</name>
</author>
<author>
<name sortKey="Shumway, M" uniqKey="Shumway M">M Shumway</name>
</author>
<author>
<name sortKey="Zaborsky, J" uniqKey="Zaborsky J">J Zaborsky</name>
</author>
<author>
<name sortKey="Feldblyum, T" uniqKey="Feldblyum T">T Feldblyum</name>
</author>
<author>
<name sortKey="Subbu, V" uniqKey="Subbu V">V Subbu</name>
</author>
<author>
<name sortKey="Spiro, Dj" uniqKey="Spiro D">DJ Spiro</name>
</author>
<author>
<name sortKey="Sitz, J" uniqKey="Sitz J">J Sitz</name>
</author>
<author>
<name sortKey="Koo, H" uniqKey="Koo H">H Koo</name>
</author>
<author>
<name sortKey="Bolotov, P" uniqKey="Bolotov P">P Bolotov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rambaut, A" uniqKey="Rambaut A">A Rambaut</name>
</author>
<author>
<name sortKey="Pybus, Og" uniqKey="Pybus O">OG Pybus</name>
</author>
<author>
<name sortKey="Nelson, Mi" uniqKey="Nelson M">MI Nelson</name>
</author>
<author>
<name sortKey="Viboud, C" uniqKey="Viboud C">C Viboud</name>
</author>
<author>
<name sortKey="Taubenberger, Jk" uniqKey="Taubenberger J">JK Taubenberger</name>
</author>
<author>
<name sortKey="Holmes, Ec" uniqKey="Holmes E">EC Holmes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rabadan, R" uniqKey="Rabadan R">R Rabadan</name>
</author>
<author>
<name sortKey="Levine, Aj" uniqKey="Levine A">AJ Levine</name>
</author>
<author>
<name sortKey="Krasnitz, M" uniqKey="Krasnitz M">M Krasnitz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holmes, Ec" uniqKey="Holmes E">EC Holmes</name>
</author>
<author>
<name sortKey="Ghedin, E" uniqKey="Ghedin E">E Ghedin</name>
</author>
<author>
<name sortKey="Miller, N" uniqKey="Miller N">N Miller</name>
</author>
<author>
<name sortKey="Taylor, J" uniqKey="Taylor J">J Taylor</name>
</author>
<author>
<name sortKey="Bao, Y" uniqKey="Bao Y">Y Bao</name>
</author>
<author>
<name sortKey="St George, K" uniqKey="St George K">K St George</name>
</author>
<author>
<name sortKey="Grenfell, Bt" uniqKey="Grenfell B">BT Grenfell</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author>
<name sortKey="Fraser, Cm" uniqKey="Fraser C">CM Fraser</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
<author>
<name sortKey="Cattoli, G" uniqKey="Cattoli G">G Cattoli</name>
</author>
<author>
<name sortKey="Spiro, Dj" uniqKey="Spiro D">DJ Spiro</name>
</author>
<author>
<name sortKey="Janies, Da" uniqKey="Janies D">DA Janies</name>
</author>
<author>
<name sortKey="Aly, Mm" uniqKey="Aly M">MM Aly</name>
</author>
<author>
<name sortKey="Brown, Ih" uniqKey="Brown I">IH Brown</name>
</author>
<author>
<name sortKey="Couacy Hymann, E" uniqKey="Couacy Hymann E">E Couacy-Hymann</name>
</author>
<author>
<name sortKey="De Mia, Gm" uniqKey="De Mia G">GM De Mia</name>
</author>
<author>
<name sortKey="Dung Do, H" uniqKey="Dung Do H">H Dung do</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macken, Ca" uniqKey="Macken C">CA Macken</name>
</author>
<author>
<name sortKey="Webby, Rj" uniqKey="Webby R">RJ Webby</name>
</author>
<author>
<name sortKey="Bruno, Wj" uniqKey="Bruno W">WJ Bruno</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nelson, Mi" uniqKey="Nelson M">MI Nelson</name>
</author>
<author>
<name sortKey="Viboud, C" uniqKey="Viboud C">C Viboud</name>
</author>
<author>
<name sortKey="Simonsen, L" uniqKey="Simonsen L">L Simonsen</name>
</author>
<author>
<name sortKey="Bennett, Rt" uniqKey="Bennett R">RT Bennett</name>
</author>
<author>
<name sortKey="Griesemer, Sb" uniqKey="Griesemer S">SB Griesemer</name>
</author>
<author>
<name sortKey="St George, K" uniqKey="St George K">K St George</name>
</author>
<author>
<name sortKey="Taylor, J" uniqKey="Taylor J">J Taylor</name>
</author>
<author>
<name sortKey="Spiro, Dj" uniqKey="Spiro D">DJ Spiro</name>
</author>
<author>
<name sortKey="Sengamalay, Na" uniqKey="Sengamalay N">NA Sengamalay</name>
</author>
<author>
<name sortKey="Ghedin, E" uniqKey="Ghedin E">E Ghedin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, D" uniqKey="Huson D">D Huson</name>
</author>
<author>
<name sortKey="Klopper, T" uniqKey="Klopper T">T Klopper</name>
</author>
<author>
<name sortKey="Lockhart, P" uniqKey="Lockhart P">P Lockhart</name>
</author>
<author>
<name sortKey="Steel, M" uniqKey="Steel M">M Steel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, D" uniqKey="Huson D">D Huson</name>
</author>
<author>
<name sortKey="Klopper, T" uniqKey="Klopper T">T Klopper</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Padidam, M" uniqKey="Padidam M">M Padidam</name>
</author>
<author>
<name sortKey="Sawyer, S" uniqKey="Sawyer S">S Sawyer</name>
</author>
<author>
<name sortKey="Fauquet, Cm" uniqKey="Fauquet C">CM Fauquet</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Martin, Dp" uniqKey="Martin D">DP Martin</name>
</author>
<author>
<name sortKey="Posada, D" uniqKey="Posada D">D Posada</name>
</author>
<author>
<name sortKey="Crandall, Ka" uniqKey="Crandall K">KA Crandall</name>
</author>
<author>
<name sortKey="Williamson, C" uniqKey="Williamson C">C Williamson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Posada, D" uniqKey="Posada D">D Posada</name>
</author>
<author>
<name sortKey="Crandall, Ka" uniqKey="Crandall K">KA Crandall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Martin, Dp" uniqKey="Martin D">DP Martin</name>
</author>
<author>
<name sortKey="Williamson, C" uniqKey="Williamson C">C Williamson</name>
</author>
<author>
<name sortKey="Posada, D" uniqKey="Posada D">D Posada</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Planet, Pj" uniqKey="Planet P">PJ Planet</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mickevich, Mf" uniqKey="Mickevich M">MF Mickevich</name>
</author>
<author>
<name sortKey="Farris, Js" uniqKey="Farris J">JS Farris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kishino, H" uniqKey="Kishino H">H Kishino</name>
</author>
<author>
<name sortKey="Hasegawa, M" uniqKey="Hasegawa M">M Hasegawa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nagarajan, N" uniqKey="Nagarajan N">N Nagarajan</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Godzik, A" uniqKey="Godzik A">A Godzik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
<author>
<name sortKey="Nagarajan, N" uniqKey="Nagarajan N">N Nagarajan</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rambaut, A" uniqKey="Rambaut A">A Rambaut</name>
</author>
<author>
<name sortKey="Grassly, Nc" uniqKey="Grassly N">NC Grassly</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilgenbusch, Jc" uniqKey="Wilgenbusch J">JC Wilgenbusch</name>
</author>
<author>
<name sortKey="Swofford, D" uniqKey="Swofford D">D Swofford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Rc" uniqKey="Edgar R">RC Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huelsenbeck, Jp" uniqKey="Huelsenbeck J">JP Huelsenbeck</name>
</author>
<author>
<name sortKey="Ronquist, F" uniqKey="Ronquist F">F Ronquist</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Drummond, Aj" uniqKey="Drummond A">AJ Drummond</name>
</author>
<author>
<name sortKey="Rambaut, A" uniqKey="Rambaut A">A Rambaut</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jinyan, L" uniqKey="Jinyan L">L Jinyan</name>
</author>
<author>
<name sortKey="Guimei, L" uniqKey="Guimei L">L Guimei</name>
</author>
<author>
<name sortKey="Haiquan, L" uniqKey="Haiquan L">L Haiquan</name>
</author>
<author>
<name sortKey="Limsoon, W" uniqKey="Limsoon W">W Limsoon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gabriela, A" uniqKey="Gabriela A">A Gabriela</name>
</author>
<author>
<name sortKey="Sorin, A" uniqKey="Sorin A">A Sorin</name>
</author>
<author>
<name sortKey="Yves, C" uniqKey="Yves C">C Yves</name>
</author>
<author>
<name sortKey="Stephan, F" uniqKey="Stephan F">F Stephan</name>
</author>
<author>
<name sortKey="Peter, Lh" uniqKey="Peter L">LH Peter</name>
</author>
<author>
<name sortKey="Bruno, S" uniqKey="Bruno S">S Bruno</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gray, Gc" uniqKey="Gray G">GC Gray</name>
</author>
<author>
<name sortKey="Mccarthy, T" uniqKey="Mccarthy T">T McCarthy</name>
</author>
<author>
<name sortKey="Capuano, Aw" uniqKey="Capuano A">AW Capuano</name>
</author>
<author>
<name sortKey="Setterquist, Sf" uniqKey="Setterquist S">SF Setterquist</name>
</author>
<author>
<name sortKey="Olsen, Cw" uniqKey="Olsen C">CW Olsen</name>
</author>
<author>
<name sortKey="Alavanja, Mc" uniqKey="Alavanja M">MC Alavanja</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Obenauer, Jc" uniqKey="Obenauer J">JC Obenauer</name>
</author>
<author>
<name sortKey="Denson, J" uniqKey="Denson J">J Denson</name>
</author>
<author>
<name sortKey="Mehta, Pk" uniqKey="Mehta P">PK Mehta</name>
</author>
<author>
<name sortKey="Su, X" uniqKey="Su X">X Su</name>
</author>
<author>
<name sortKey="Mukatira, S" uniqKey="Mukatira S">S Mukatira</name>
</author>
<author>
<name sortKey="Finkelstein, Db" uniqKey="Finkelstein D">DB Finkelstein</name>
</author>
<author>
<name sortKey="Xu, X" uniqKey="Xu X">X Xu</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Ma, J" uniqKey="Ma J">J Ma</name>
</author>
<author>
<name sortKey="Fan, Y" uniqKey="Fan Y">Y Fan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Takemae, N" uniqKey="Takemae N">N Takemae</name>
</author>
<author>
<name sortKey="Parchariyanon, S" uniqKey="Parchariyanon S">S Parchariyanon</name>
</author>
<author>
<name sortKey="Damrongwatanapokin, S" uniqKey="Damrongwatanapokin S">S Damrongwatanapokin</name>
</author>
<author>
<name sortKey="Uchida, Y" uniqKey="Uchida Y">Y Uchida</name>
</author>
<author>
<name sortKey="Ruttanapumma, R" uniqKey="Ruttanapumma R">R Ruttanapumma</name>
</author>
<author>
<name sortKey="Watanabe, C" uniqKey="Watanabe C">C Watanabe</name>
</author>
<author>
<name sortKey="Yamaguchi, S" uniqKey="Yamaguchi S">S Yamaguchi</name>
</author>
<author>
<name sortKey="Saito, T" uniqKey="Saito T">T Saito</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chutinimitkul, S" uniqKey="Chutinimitkul S">S Chutinimitkul</name>
</author>
<author>
<name sortKey="Thippamom, N" uniqKey="Thippamom N">N Thippamom</name>
</author>
<author>
<name sortKey="Damrongwatanapokin, S" uniqKey="Damrongwatanapokin S">S Damrongwatanapokin</name>
</author>
<author>
<name sortKey="Payungporn, S" uniqKey="Payungporn S">S Payungporn</name>
</author>
<author>
<name sortKey="Thanawongnuwech, R" uniqKey="Thanawongnuwech R">R Thanawongnuwech</name>
</author>
<author>
<name sortKey="Amonsin, A" uniqKey="Amonsin A">A Amonsin</name>
</author>
<author>
<name sortKey="Boonsuk, P" uniqKey="Boonsuk P">P Boonsuk</name>
</author>
<author>
<name sortKey="Sreta, D" uniqKey="Sreta D">D Sreta</name>
</author>
<author>
<name sortKey="Bunpong, N" uniqKey="Bunpong N">N Bunpong</name>
</author>
<author>
<name sortKey="Tantilertcharoen, R" uniqKey="Tantilertcharoen R">R Tantilertcharoen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Khiabanian, H" uniqKey="Khiabanian H">H Khiabanian</name>
</author>
<author>
<name sortKey="Trifonov, V" uniqKey="Trifonov V">V Trifonov</name>
</author>
<author>
<name sortKey="Rabadan, R" uniqKey="Rabadan R">R Rabadan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wan, Xf" uniqKey="Wan X">XF Wan</name>
</author>
<author>
<name sortKey="Chen, G" uniqKey="Chen G">G Chen</name>
</author>
<author>
<name sortKey="Luo, F" uniqKey="Luo F">F Luo</name>
</author>
<author>
<name sortKey="Emch, M" uniqKey="Emch M">M Emch</name>
</author>
<author>
<name sortKey="Donis, R" uniqKey="Donis R">R Donis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bokhari, Sh" uniqKey="Bokhari S">SH Bokhari</name>
</author>
<author>
<name sortKey="Janies, Da" uniqKey="Janies D">DA Janies</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paraskevis, D" uniqKey="Paraskevis D">D Paraskevis</name>
</author>
<author>
<name sortKey="Deforche, K" uniqKey="Deforche K">K Deforche</name>
</author>
<author>
<name sortKey="Lemey, P" uniqKey="Lemey P">P Lemey</name>
</author>
<author>
<name sortKey="Magiorkinis, G" uniqKey="Magiorkinis G">G Magiorkinis</name>
</author>
<author>
<name sortKey="Hatzakis, A" uniqKey="Hatzakis A">A Hatzakis</name>
</author>
<author>
<name sortKey="Vandamme, Am" uniqKey="Vandamme A">AM Vandamme</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="publisher-id">nar</journal-id>
<journal-id journal-id-type="hwp">nar</journal-id>
<journal-title-group>
<journal-title>Nucleic Acids Research</journal-title>
</journal-title-group>
<issn pub-type="ppub">0305-1048</issn>
<issn pub-type="epub">1362-4962</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">21177643</article-id>
<article-id pub-id-type="pmc">3064795</article-id>
<article-id pub-id-type="doi">10.1093/nar/gkq1232</article-id>
<article-id pub-id-type="publisher-id">gkq1232</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methods Online</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>GiRaF: robust, computational identification of influenza reassortments via graph mining</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Nagarajan</surname>
<given-names>Niranjan</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="COR1">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kingsford</surname>
<given-names>Carl</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="AFF1">
<sup>3</sup>
</xref>
</contrib>
</contrib-group>
<aff id="AFF1">
<sup>1</sup>
Computational and Mathematical Biology, Genome Institute of Singapore, Singapore-127726,
<sup>2</sup>
Center for Bioinformatics and Computational Biology and
<sup>3</sup>
Department of Computer Science, University of Maryland, College Park, MD 20742, USA</aff>
<author-notes>
<corresp id="COR1">*To whom correspondence should be addressed. Tel:
<phone>+65 6808 8071</phone>
; Fax:
<fax>+65 6808 9058</fax>
; Email:
<email>nagarajann@gis.a-star.edu.sg</email>
</corresp>
</author-notes>
<pmc-comment>For NAR both ppub and collection dates generated for PMC processing 1/27/05 beck</pmc-comment>
<pub-date pub-type="collection">
<month>3</month>
<year>2011</year>
</pub-date>
<pub-date pub-type="ppub">
<month>3</month>
<year>2011</year>
</pub-date>
<pub-date pub-type="epub">
<day>21</day>
<month>12</month>
<year>2010</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>21</day>
<month>12</month>
<year>2010</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>39</volume>
<issue>6</issue>
<fpage>e34</fpage>
<lpage>e34</lpage>
<history>
<date date-type="received">
<day>6</day>
<month>8</month>
<year>2010</year>
</date>
<date date-type="rev-recd">
<day>8</day>
<month>11</month>
<year>2010</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>11</month>
<year>2010</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2010. Published by Oxford University Press.</copyright-statement>
<copyright-year>2010</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by-nc/2.5">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/2.5">http://creativecommons.org/licenses/by-nc/2.5</ext-link>
), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Reassortments in the influenza virus—a process where strains exchange genetic segments—have been implicated in two out of three pandemics of the 20th century as well as the 2009 H1N1 outbreak. While advances in sequencing have led to an explosion in the number of whole-genome sequences that are available, an understanding of the rate and distribution of reassortments and their role in viral evolution is still lacking. An important factor in this is the paucity of automated tools for confident identification of reassortments from sequence data due to the challenges of analyzing large, uncertain viral phylogenies. We describe here a novel computational method, called GiRaF (Graph-incompatibility-based Reassortment Finder), that robustly identifies reassortments in a fully automated fashion while accounting for uncertainties in the inferred phylogenies. The algorithms behind GiRaF search large collections of Markov chain Monte Carlo (MCMC)-sampled trees for groups of incompatible splits using a fast biclique enumeration algorithm coupled with several statistical tests to identify sets of taxa with differential phylogenetic placement. GiRaF correctly finds known reassortments in human, avian, and swine influenza populations, including the evolutionary events that led to the recent ‘swine flu’ outbreak. GiRaF also identifies several previously unreported reassortments via whole-genome studies to catalog events in H5N1 and swine influenza isolates.</p>
</abstract>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>INTRODUCTION</title>
<p>The genome of the influenza A virus is composed of eight independent segments, and simultaneous infection of a host by two or more strains can lead to the packaging of a hybrid strain whose segments derive from different lineages—a mixing process called ‘reassortment’. Reassortment events can quickly create a strain to which there is little or no immunity in the human population, and they have been repeatedly implicated in pandemics including the H2N2 Asian Flu in 1957 and the H3N2 Hong Kong Flu in 1968 (
<xref ref-type="bibr" rid="B1">1</xref>
). The recent H1N1 ‘swine flu’ outbreak has also been linked to a novel reassortment between North American and Eurasian swine lineages (
<xref ref-type="bibr" rid="B2">2</xref>
). Early detection of reassortant strains is therefore an important goal for influenza surveillance and efforts to thwart a future pandemic (
<ext-link ext-link-type="uri" xlink:href="http://www.cdc.gov/flu/weekly/fluactivity.htm">http://www.cdc.gov/flu/weekly/fluactivity.htm</ext-link>
).</p>
<p>Despite the recent increased availability of whole-genome sequences, a comprehensive picture of reassortments and how they relate to antigenic evolution is still missing (
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B4">4</xref>
). This is in part due to the unavailability of automated tools that can reconstruct and analyze large viral phylogenies to confidently identify reassortments (
<xref ref-type="bibr" rid="B5">5</xref>
). The common approach to identifying reassortments involves reconstructing species and segment trees and manually comparing them, a laborious and time-consuming task (
<xref ref-type="bibr" rid="B6 B7 B8 B9">6–9</xref>
). Moreover, influenza sequences have high mutation rates and tangled evolutionary histories, making the task of phylogenetic reconstruction particularly hard. Reassortment analysis is thus limited to high-confidence subtrees and prone to missing recent or subtle reassortments (
<xref ref-type="bibr" rid="B5">5</xref>
,
<xref ref-type="bibr" rid="B6">6</xref>
).</p>
<p>The general problem of identifying events of reticulate (non-tree-like) evolution and sequences with hybrid evolutionary history has been studied before in the context of horizontal gene transfer (
<xref ref-type="bibr" rid="B10">10</xref>
,
<xref ref-type="bibr" rid="B11">11</xref>
). Approaches for these problems are typically applied to small, well-defined gene trees and tackle the computationally expensive problem of inferring a parsimonious evolutionary scenario. Influenza datasets tend to have many more sequences and less well-defined phylogenies and consequently these methods are not used in published studies of influenza.</p>
<p>While biologically the process of ‘genetic recombination’ is distinct from a reassortment, from a sequence perspective, a reassortment can be viewed as a recombination with ‘breakpoints’ at segment ends. Methods for identifying recombination events, which have been widely studied (
<xref ref-type="bibr" rid="B12 B13 B14 B15">12–15</xref>
), are therefore plausible tools for the study of reassortments. However, the goals of these methods are often inappropriate for the reassortment detection problem. For example, many methods for studying recombination focus on correctly identifying recombination breakpoints, a task that is trivial for reassortments. Methods for the identification of the parental strains of putative recombinants, an essential step in recombination detection, either assume that the potential parents are known or do a limited search over a small number of taxa. In addition, while some recombination methods employ heuristic searches to identify plausible recombinants, manual comparison of phylogenies is still a preferred method to avoid high false-positive rates (RDP3 Manual,
<ext-link ext-link-type="uri" xlink:href="http://darwin.uvigo.es/rdp/RDP3Manual.zip">http://darwin.uvigo.es/rdp/RDP3Manual.zip</ext-link>
).</p>
<p>Due to the uncertainties and computational expense of phylogenetic reconstruction, an approach was recently proposed that bypasses this step completely and relies solely on detecting variations in edit distances between sequences of various taxa that indicate the presence of a reassortment (
<xref ref-type="bibr" rid="B5">5</xref>
). While this approach is computationally simple, it does not directly identify the reassorted taxa and is based on information that is likely to be a necessary, but not sufficient condition, for detecting reassortments. Similarly, while a variety of other statistical tests for ‘phylogenetic discord’ (
<xref ref-type="bibr" rid="B16">16</xref>
), such as incongruence length difference (ILD) (
<xref ref-type="bibr" rid="B17">17</xref>
) and the Kishino-Hasegawa test (
<xref ref-type="bibr" rid="B18">18</xref>
), can avoid phylogenetic reconstruction, they do not directly predict the reticulation events involved.</p>
<p>We present a new method, called GiRaF (Graph-incompatibility-based Reassortment Finder), that uses data-mining techniques to find reassortments in a given collection of sequences (explicitly identifying the set of isolates arising from a reassortment). The method, based on an earlier approach (
<xref ref-type="bibr" rid="B19">19</xref>
), compares distributions of trees by constructing an ‘incompatibility graph’ and mining it for phylogenetic discordances using a fast search algorithm. GiRaF then employs a phylogenetic distance test to substantially improve on the false positive rate [from the 86% reported earlier (
<xref ref-type="bibr" rid="B19">19</xref>
)] and combines answers from all segments of the genome to produce a comprehensive catalog of reassortments. Our results show that GiRaF can identify precisely both recent and phylogenetically deep reassortments (whose parents are within the given set) and can be used to uncover complex reassortment histories. GiRaF also reports a measure of confidence for its predictions and can efficiently analyze large datasets.</p>
</sec>
<sec sec-type="methods">
<title>METHODS</title>
<sec>
<title>Influenza datasets</title>
<p>Genomic sequences for the 156 influenza A (H3N2) isolates studied in Holmes
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="B6">6</xref>
) and the 35 avian influenza A (H5N1) isolates studied in Salzberg
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="B7">7</xref>
) were obtained from NCBI’s Influenza Virus Sequence database (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html">http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html</ext-link>
). The dataset of non-human H5N1 sequences was constructed from 1101 whole-genome sequences in the Influenza Virus Sequence database. A non-redundant subset of 71 genomes was then extracted [using the program CD-Hit (
<xref ref-type="bibr" rid="B20">20</xref>
) and a threshold of 98% for sequence similarity in the NA segment] and analyzed using GiRaF. The analysis of swine influenza and S-OIV sequences was conducted on a subset of 140 isolates, out of the 173 used in the Kingsford
<italic>et al</italic>
. study (
<xref ref-type="bibr" rid="B21">21</xref>
), for which whole-genome sequences were available.</p>
</sec>
<sec>
<title>Synthetic sequences</title>
<p>Synthetic sequences were generated using the program Seq-Gen (
<xref ref-type="bibr" rid="B22">22</xref>
) and various simulated scenarios for reassortment events. The starting tree topology was obtained by constructing a neighbor-joining tree for the HA segment of isolates from Holmes
<italic>et al</italic>
. using the program PAUP (
<xref ref-type="bibr" rid="B23">23</xref>
) (branch lengths were estimated using the default likelihood criterion). The original sequences were discarded and new leaf sequences were simulated on the tree with sequence length, background nucleotide frequencies and transition/transversion ratio to mimic the HA segment of isolates from Holmes
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="B6">6</xref>
) (using the F84 model of sequence evolution and four rate categories). Reassortments were then introduced into the tree by selecting at random a subtree of size between parameters
<italic>minsize</italic>
and
<italic>maxsize</italic>
, moving the subtree to a different random place in the tree, and repeating this process ‘count’ times. We use the terms ‘reassortment set’ and ‘event’ interchangeably and the term ‘implant’ for a synthetic ‘reassortment set’. Parameters from the NA segment were then used to simulate new sequences on the modified tree. For each choice of the parameters
<italic>minsize</italic>
,
<italic>maxsize</italic>
and
<italic>count</italic>
described in ‘Results’ section, 100 test sets were generated and the results were pooled to compute the evaluation metrics detailed below.</p>
</sec>
<sec>
<title>Evaluation metrics</title>
<p>A predicted reassortment was considered ‘correct’ (and the corresponding implant ‘identified’) if it matched an implanted set exactly. Sensitivity and positive predictive value (PPV) were computed as:
<list list-type="simple">
<list-item>
<p>Sensitivity = number of identified implants/number of implants</p>
</list-item>
<list-item>
<p>PPV = number of correct predictions/number of predictions</p>
</list-item>
</list>
Corresponding statistics were also computed at the isolate level by considering a predicted isolate correct if it was contained in one of the implanted reassortment sets. In the case of multiple reassortments events, both the above metric and a ‘relaxed’ variant were computed. In the relaxed variant, a predicted reassortment set was considered ‘correct’ if all its elements were contained within an implanted reassortment and correspondingly an implanted reassortment was considered ‘identified’ if all its elements were in predicted sets.</p>
</sec>
<sec>
<title>Tree distributions</title>
<p>Sequences for each segment were aligned using MUSCLE (
<xref ref-type="bibr" rid="B24">24</xref>
) with default parameters (the resulting alignments had few gaps) and used as input for MrBayes (
<xref ref-type="bibr" rid="B25">25</xref>
) to sample 1000 unrooted candidate trees (GTR model, γ-distributed rate variation, burn-in of 100 000 iterations and sampling every 200 iterations). These trees were then used to model the phylogenetic uncertainty for each segment as detailed below. Note that, in principle, other phylogenetic tree construction methods, such as BEAST (
<xref ref-type="bibr" rid="B26">26</xref>
) or neighbor-joining with bootstraping, could be used to generate ensembles of trees for input to GiRaF.</p>
</sec>
<sec>
<title>Constructing the incompatibility graph</title>
<p>To identify disagreements between distributions of trees, the well-known concept of splits and incompatible splits (
<xref ref-type="bibr" rid="B10">10</xref>
) was employed. Every edge in a tree defines a partition of the set of taxa into two sets. Such a partition is referred to as a split, and every tree can be seen as a collection of splits. Two splits with partitions A|B and X|Y are incompatible if the four intersections A∩X, A∩Y, B∩X and B∩Y are all non-empty. It can be shown that under this definition of incompatible splits, two trees are phylogenetically incompatible if and only if they contain incompatible splits (
<xref ref-type="bibr" rid="B10">10</xref>
). We use this fact as follows: we transform a sampled collection of possible trees for a segment into the corresponding set of splits and assign a confidence to every split based on the proportion of trees that contain them. Splits in fewer than 5% of the sampled trees (the least confident set) are discarded as this dramatically reduces the size of the graph without affecting performance. The splits are then used to construct a graph with splits from two segments as nodes on either side and edges connecting splits that are incompatible (
<xref ref-type="bibr" rid="B10">10</xref>
,
<xref ref-type="bibr" rid="B19">19</xref>
). This incompatibility graph is a concise representation of the disagreements between the phylogenies for the two segments while accounting for phylogenetic uncertainty (
<xref ref-type="fig" rid="F1">Figure 1</xref>
).
<fig id="F1" position="float">
<label>Figure 1.</label>
<caption>
<p>Incompatibility graph. The graph contains a node for every split observed in any sampled tree. Edges connect incompatible splits contained in trees from different segments. The weight of a subset of splits is equal to the probability that the true tree contains one of the splits, estimated here as the weighted fraction of sampled trees that contain at least one of the splits. Darker lines indicate a biclique and boxes show the trees that contain one of the splits involved in the biclique. For a high-confidence biclique, the true trees for the segments cannot simultaneously come from the corresponding boxed sets of trees.</p>
</caption>
<graphic xlink:href="gkq1232f1"></graphic>
</fig>
</p>
<p>We then look for
<italic>bicliques</italic>
in this graph, where a biclique is given by two subsets of nodes (i.e. splits), one subset from each side of the incompatibility graph, such that all possible edges exist between nodes in the two subsets (i.e. the splits are all mutually incompatible). Bicliques where the sets of splits have high confidence values are evidence for incompatibilities between the true phylogenies of the two segments, and therefore serve as evidence for reassortments (
<xref ref-type="bibr" rid="B19">19</xref>
). The confidence value assigned to a set of splits is the probability that one of the sampled trees for a segment contains at least one of the splits in the set. The confidence value assigned to a biclique is the product of the confidence values for the two sets of splits that participate in the biclique. This confidence value is an estimate for the probability that both the true trees contain at least one split from each part of the biclique and are therefore phylogenetically incompatible due to these splits.</p>
</sec>
<sec>
<title>Biclique enumeration</title>
<p>The problem of finding large, dense subgraphs in a graph and, in the extreme, finding cliques and bicliques, is a well-studied problem in graph theory with many applications in areas such as data-mining, the study of web-communities and finding complexes in protein interaction networks (
<xref ref-type="bibr" rid="B27">27</xref>
). While in general, the problem of finding large cliques and bicliques can be computationally challenging (NP-hard), in practice, exact solutions are feasible in several cases using efficient branch-and-bound algorithms (
<xref ref-type="bibr" rid="B27">27</xref>
,
<xref ref-type="bibr" rid="B28">28</xref>
). GiRaF uses a novel exact algorithm to enumerate all high-confidence bicliques suggestive of reassortment events, with negligible running times on typical incompatibility graphs (
<xref ref-type="bibr" rid="B19">19</xref>
) (worst-case runtime to output a new biclique is quadratic in the number of splits and memory usage is linear in the number of bicliques). A more detailed description of the biclique enumeration algorithm is given in the
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Data</ext-link>
. In the experiments reported here, the runtime for GiRaF was dominated by the time to sample trees (several hours), with the biclique enumeration stage being much less expensive (a few seconds).</p>
</sec>
<sec>
<title>From bicliques to reassortment sets</title>
<p>Large bicliques in the incompatibility graph serve as significant evidence for reassortments but do not directly identify the corresponding sets of taxa. To do this, we rely on the fact that for every edge in the incompatibility graph, the incompatible splits naturally define four candidate sets that label the edge (
<xref ref-type="fig" rid="F2">Figure 2</xref>
). We search for high-confidence bicliques (confidence cutoff of 0.5) that uniquely identify a candidate set among the labels [based on Theorem 4 in Nagarajan
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="B19">19</xref>
)]. We then report candidate sets supported by more than one high-confidence biclique and assign a confidence value of 1
<italic>–∏
<sub>i</sub>
(1–p
<sub>i</sub>
)</italic>
to the putative reassortment, where
<italic>p</italic>
<sub>1</sub>
<italic>, … , p
<sub>n</sub>
</italic>
are the confidence values for the supporting bicliques.
<fig id="F2" position="float">
<label>Figure 2.</label>
<caption>
<p>Reassortment candidates. The pair of incompatible splits in the two segment trees define four candidate sets (obtained by computing intersections, {a, b}∩{a, c} = {a}, {a, b} ∩ {b, d, e} = {b}, {c, d, e} ∩ {a, c} = {c} and {c, d, e} ∩ {b, d, e} = {d, e}), one of which is the reassortment set ({b}). The set {b} also satisfies the condition that it is similar to some taxa and more diverged with respect to others when comparing the two segment trees. Note that set {d} also has this property, demonstrating that it is not a sufficient condition for identifying a reassortment.</p>
</caption>
<graphic xlink:href="gkq1232f2"></graphic>
</fig>
</p>
</sec>
<sec>
<title>Phylogenetic distance test</title>
<p>In addition to topological incongruity, reassorted taxa are marked by distinct patterns of inter-isolate distances. Relative to the distances in one segment, distances involving the reassorted taxa in a second segment typically have increased between some isolates while decreased between others (
<xref ref-type="fig" rid="F2">Figure 2</xref>
). While this distance pattern is not a sufficient condition for confirming reassortment events, it can help shorten the list of candidate sets to be considered and is taken into account in GiRaF as follows: for every pair of isolates, the uncertainty in phylogenetic distance is modelled using the distribution of distances on the sampled trees (normalized so that each tree’s total length is 1). A
<italic>Z</italic>
-test (without assuming independence) is then used to identify those pairs of isolates that have diverged or come closer together (Bonferroni-corrected,
<italic>P</italic>
 ≤ 0.01) when distances between two segments are compared. Each of the candidate sets derived from a pair of incompatible splits (excluding the largest set) is then tested to see if it has diverged from one of the other three sets and come closer to another one of them, as determined by a binomial test of over-representation of isolate pairs from the
<italic>Z</italic>
-test (
<italic>P</italic>
 ≤ 0.01 and considering all pairs of taxa between the sets). Subsets of isolates that fail this test are omitted from the candidate sets considered as labels in the incompatibility graph (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Figure S4</ext-link>
).</p>
</sec>
<sec>
<title>Combining results from multiple segments</title>
<p>In cases where sequences are available for all eight segments in the influenza genome, GiRaF can be applied to all 28 pairs of segments to more comprehensively catalog reassortment events while further minimizing the chance of false positives. In principle, a reassortant set should appear in at least seven of these pairwise comparisons (and more, if more than one segment has been exchanged), while a false positive is less likely to appear that frequently. In practice, to reduce false negatives, we need to set the threshold lower, and we found that requiring candidate reassortments to appear in at least 3 pairwise GiRaF results provided a good tradeoff (based on worst-case estimates from
<xref ref-type="fig" rid="F6">Figure 6</xref>
, we can estimate an upper bound on the false-positive and false-negative rate to be 0.2). For a candidate reassortment, the information in the pairwise comparisons can be used to divide the segments into two classes, corresponding to the parent from which they descended. This partitioning is based on the requirement that, as much as possible, segments with incompatible histories are placed in opposite classes and translates into the well-known intractable problem of
<italic>Maximum Bipartite Subgraph</italic>
. However, as the problem size is small, GiRaF implements an exhaustive search over bipartitions that quickly finds the optimal solution in practice.
<fig id="F3" position="float">
<label>Figure 3.</label>
<caption>
<p>Multiple reassortments in recent human influenza A (H3N2) isolates. Consensus trees [from sampled trees in GiRaF, using MrBayes (
<xref ref-type="bibr" rid="B25">25</xref>
)] for (
<bold>a</bold>
) HA segment and (
<bold>b</bold>
) NA segment for the 156 isolates studied in Holmes
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="B6">6</xref>
). The three candidate reassortments identified by GiRaF are {A/New York/52/2004, A/New York/59/2003} and {A/New York/32/2003, A/New York/198/2003, A/New York/199/2003}, which were also previously identified, and the novel candidate {A/New York/105/2002}. The candidate reassortments are highlighted on the trees (drawn using Mesquite version 2.72,
<ext-link ext-link-type="uri" xlink:href="http://mesquiteproject.org">http://mesquiteproject.org</ext-link>
). Note that some clades have been collapsed for clarity and the full trees can be seen in
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Figure S5</ext-link>
.</p>
</caption>
<graphic xlink:href="gkq1232f3"></graphic>
</fig>
</p>
</sec>
<sec>
<title>Availability</title>
<p>Source code and executables for GiRaF as well as the datasets used in this study are freely available at
<ext-link ext-link-type="uri" xlink:href="http://www.cbcb.umd.edu/software/giraf">http://www.cbcb.umd.edu/software/giraf</ext-link>
.</p>
</sec>
</sec>
<sec sec-type="results">
<title>RESULTS</title>
<p>In order to characterize its ability to identify reassortments, GiRaF was used to analyze a range of real and synthetic datasets. Several previous studies have relied on the manual comparison of segment phylogenies to identify reassortment events, and we used these to benchmark the automated method. For more controlled studies, several synthetic reassortment datasets were also generated and analyzed.</p>
<sec>
<title>Human Influenza H3N2 and H1N1 reassortments</title>
<p>As part of the Influenza Genome Sequencing Project, a large collection of human influenza H3N2 isolates were sequenced and analyzed in a study by Holmes
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="B6">6</xref>
) to characterize the genomic diversity of the dominant subtype of seasonal flu. Through manual comparison of phylogenies for the HA segment with other segments, Holmes
<italic>et al</italic>
. identified two distinct clades containing five isolates in total that were likely to have arisen via reassortments. Our automated analysis using GiRaF (on sequences for the HA and NA segments) identified exactly three sets of taxa resulting from reassortments—two of these are identical to the clades reported in Holmes
<italic>et al</italic>
. while the third contains a single isolate, A/New York/105/2002, that appears by manual inspection to be a reassortant that was missed in the original analysis (
<xref ref-type="fig" rid="F3">Figure 3</xref>
). By comparison of the segment phylogeny of PB2 with other segments, Holmes
<italic>et al</italic>
. also report isolate A/New York/11/2003 as a likely reassortant and this was confirmed, with high confidence, by the GiRaF analysis as well (comparing NA and PB2 segments). A final candidate reassortment between PA and MP (A/New York/182/2000) that is suggested with apparent low confidence in their work could not be confirmed and manual inspection indicates that it may indeed be a false positive (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Figure S1</ext-link>
). This disagreement may be due to the different tree inference methods used and in addition, as Holmes
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="B6">6</xref>
) point out, the MP sequences for these isolates are very similar, making detection of reassortments involving that segment difficult.</p>
<p>GiRaF can process large, diverse data sets quickly. GiRaF took <5 min on a single processor to analyze the HA and NA segments of all 137 complete human H3N2 genomes in GenBank collected before 1990 (tree construction using MrBayes, though, took several hours). Four apparently novel reassortments were predicted with high confidence ({A/Albany/6/1970, A/Albany/1/1970, A/Albany/3/1970}, {A/Albany/4/1977}, {A/Hong Kong/46/71, A/Hong Kong/6/72, A/Hong Kong/50/72} and {A/Hong Kong/33/73, A/Hong Kong/49/74}). No previously published analysis identified these isolates as reassortants, further indicating the benefit of automated detection.</p>
<p>A similar search was performed on the HA and NA segments of all 839 human H1N1 genomes available from 1900 to 2010 (excluding S-OIV strains), filtered [via CD-Hit (
<xref ref-type="bibr" rid="B20">20</xref>
)] to a non-redundant set of 181 representative genomes (combined HA+NA sequence similarity cutoff 99.5%). GiRaF analyzed the trees in 11 min and reported four high-confidence putative reassortments. One of these reassortant sets {A/Iowa/CEID23/2005} identified by GiRaF was previously reported to be a ‘triple reassortant’ virus that had infected an Iowa farmer (
<xref ref-type="bibr" rid="B29">29</xref>
).</p>
</sec>
<sec>
<title>Avian influenza reassortments</title>
<p>Reassortments among avian influenza strains and between avian and human strains are of special concern for influenza surveillance. Human–avian reassortments in particular led to the pandemics of 1957 and 1968. The H5N1 avian flu outbreak in 2003 caused hundreds of human deaths and led to the culling of millions of birds, prompting further concerns about the ability of avian strains to gain human transmissibility through reassortments (
<xref ref-type="bibr" rid="B30">30</xref>
). To get a more detailed picture of the spread of avian influenza from Asia to other parts of the world, a recent study sequenced and analyzed 36 isolates from birds in Europe, North Africa and Southeast Asia and identified an isolate from Nigeria (A/chicken/Nigeria/1047-62/2006) as a likely reassortant (
<xref ref-type="bibr" rid="B7">7</xref>
). We reanalyzed these sequences with GiRaF and confirmed that when comparing HA and NA segment phylogenies, this isolate emerges as the unique reassortant (reported by GiRaF with a confidence value of 1). Analysis of other segment phylogenies with GiRaF revealed an additional isolate with a clear pattern of reassortment (A/cygnus olor/Italy/742/2006, involving PA and PB1) that was not uncovered by the earlier manual search, highlighting the utility of an automated approach even for small datasets (
<xref ref-type="fig" rid="F4">Figure 4</xref>
).</p>
<p>To illustrate the feasibility of large-scale analysis with GiRaF, GiRaF was also used to catalog reassortments in a more comprehensive set of H5N1 influenza whole-genome sequences obtained from NCBI’s Influenza Virus Sequence database (see ‘Methods’ section). Because of the incompleteness of existing reports of reassortments in the literature, we cannot assess the specificity of GiRaF based on this catalog. However, this analysis identified several single- and multi-taxa reassortment events (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Table S1</ext-link>
).</p>
<p>Furthermore, we were able to characterize the architecture of these reassortments by combining information from GiRaF analysis for all pairs of segments (see ‘Methods’ section). As expected, a majority of the events (13 out of 18) involve multiple segments, though a slight bias for single-segment reassortments cannot be statistically ruled out (
<xref ref-type="bibr" rid="B5">5</xref>
). In addition, there seems to be no significant bias in what segments are inherited together (or separately) through the reassorment event (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Table S1</ext-link>
) (
<xref ref-type="bibr" rid="B8">8</xref>
).</p>
</sec>
<sec>
<title>2009 S-OIV reassortments</title>
<p>The recent H1N1 S-OIV (‘swine flu’) outbreak arose from a novel reassortment between North American and Eurasian swine influenza lineages (
<xref ref-type="bibr" rid="B2">2</xref>
), further emphasizing the need for increased surveillance and study of swine influenza strains. As pigs can become infected with human, avian and swine lineages of influenza, they serve as ideal breeding grounds for novel reassortments to emerge, though the scale and distribution of these events is not fully understood. Using GiRaF, we analyzed a set of 140 swine influenza and S-OIV sequences that were studied previously (
<xref ref-type="bibr" rid="B21">21</xref>
) and catalogued reassortment events in the set (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Table S2</ext-link>
). This analysis clearly identified the 2009 S-OIV sequences as reassortants and recovered the precise architecture of the reassortment. Several previously identified Thai reassortments (
<xref ref-type="bibr" rid="B31">31</xref>
,
<xref ref-type="bibr" rid="B32">32</xref>
) were also recovered by the GiRaF analysis. Overall, 15 single-taxa and 22 multi-taxa reassortment candidates were recovered, reflecting the abundance of reassortment events in the sequenced isolates. As was the case for the H5N1 isolates, while the polymerase segments (PA, PB1, PB2) do tend to cluster together quite frequently, we found no statistically significant bias in the association of segments (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Table S2</ext-link>
) (
<xref ref-type="bibr" rid="B33">33</xref>
).
<fig id="F4" position="float">
<label>Figure 4.</label>
<caption>
<p>Analysis of avian influenza isolates from Salzberg
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="B7">7</xref>
). Consensus trees for (
<bold>a</bold>
) PB1 segment and (
<bold>b</bold>
) PA segment. The two candidate reassortments identified by GiRaF are {A/chicken/Nigeria/1047-62/2006}, which was previously identified and the novel candidate {A/cygnus olor/Italy/742/2006}, and they are both highlighted on the trees (drawn using Mequite).</p>
</caption>
<graphic xlink:href="gkq1232f4"></graphic>
</fig>
<fig id="F5" position="float">
<label>Figure 5.</label>
<caption>
<p>Sensitivity of GiRaF as function of phylogenetic distance. Results from the ‘All Events’ dataset in
<xref ref-type="table" rid="T1">Table 1</xref>
, were categorized based on the F84 distance of implanted reassortments (from their original location) and the corresponding frequency histogram was graphed. This distance is a proxy for the sequence similarity of the (unobserved) ancestral sequences from which the two segments derived. GiRaF has nearly perfect sensitivity for implants with F84 distance >0.005 suggesting that the false positives are largely due to the challenge of distinguishing subtle events from phylogenetic noise.</p>
</caption>
<graphic xlink:href="gkq1232f5"></graphic>
</fig>
</p>
</sec>
<sec>
<title>Evaluation on synthetic datasets</title>
<p>We experimented with GiRaF on several synthetic datasets with implanted reassortments (see ‘Methods’ section) in order to assess performance in a controlled setting. These studies indicate that GiRaF can identify reassortment events with high sensitivity as well as high precision (
<xref ref-type="table" rid="T1">Table 1</xref>
). On average (over 100 replicates) nearly 8 out of 10 reassortment sets predicted by GiRaF were found to be correct while 8 out of 10 implanted reassortments were recovered perfectly. Similar results were obtained for the task of identifying reassorted taxa, though accuracy was affected in some cases due to the misidentification of a few large candidate sets. In general, the few false positives reported by GiRaF were dominated by large sets, a feature that lends itself well to manual filtering of obvious false positives, if needed (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Figure S2</ext-link>
). In datasets where no reassortments were implanted the false-positive rate was found to be <0.03.
<table-wrap id="T1" position="float">
<label>Table 1.</label>
<caption>
<p>Performance of GiRaF on various synthetic datasets</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Experiment</th>
<th colspan="2" align="center" rowspan="1">Reassortment sets
<hr></hr>
</th>
<th colspan="2" align="center" rowspan="1">Reassortant taxa
<hr></hr>
</th>
</tr>
<tr>
<th rowspan="1" colspan="1"></th>
<th rowspan="1" colspan="1">Sensitivity</th>
<th rowspan="1" colspan="1">PPV</th>
<th rowspan="1" colspan="1">Sensitivity</th>
<th rowspan="1" colspan="1">PPV</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">All events</td>
<td rowspan="1" colspan="1">0.81 ± 0.08</td>
<td rowspan="1" colspan="1">0.79 ± 0.08</td>
<td rowspan="1" colspan="1">0.75 ± 0.04</td>
<td rowspan="1" colspan="1">0.65 ± 0.05</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Small (recent) events</td>
<td rowspan="1" colspan="1">0.79 ± 0.08</td>
<td rowspan="1" colspan="1">0.93 ± 0.08</td>
<td rowspan="1" colspan="1">0.79 ± 0.08</td>
<td rowspan="1" colspan="1">0.64 ± 0.05</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Large (old) events</td>
<td rowspan="1" colspan="1">0.76 ± 0.08</td>
<td rowspan="1" colspan="1">0.86 ± 0.07</td>
<td rowspan="1" colspan="1">0.74 ± 0.03</td>
<td rowspan="1" colspan="1">0.82 ± 0.02</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TF1">
<p>For these tests, a single reassortment was implanted. In the case of ‘All events’, we set
<italic>minsize</italic>
 = 1,
<italic>maxsize</italic>
 = 20 (the reassorted clade contained anywhere between 1 and 20 taxa), for ‘Small (recent) events’
<italic>minsize</italic>
 = 
<italic>maxsize</italic>
 = 1 and for ‘Large (old) events’,
<italic>minsize</italic>
=5,
<italic>maxsize</italic>
=20. Sensitivity and PPV were computed as detailed in the ‘Methods’ section.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>Reassortments that involve small shifts in phylogeny can be hard or impossible to detect using the sequence of isolates alone, and it is unlikely that any computational tool can achieve perfect sensitivity when the magnitude of phylogenetic incongruence is within the uncertainty of phylogenetic reconstruction. We were able to explore GiRaF’s sensitivity to these subtle reassortments by performing our simulations without constraining the location of implanted reassortments.
<italic>Post facto</italic>
analysis of reassortment events missed by GiRaF clearly shows the difficulty of identifying reassortments between strains of very similar sequence—all but two of the missed implanted events involve subtrees that were moved a distance of ≤0.005 (under the F84 model) in the tree (see ‘Methods’ section and
<xref ref-type="fig" rid="F5">Figure 5</xref>
). Surprisingly, despite this difficulty, more than 40% of the events with F84 distance in this range are identified correctly by GiRaF.
<fig id="F6" position="float">
<label>Figure 6.</label>
<caption>
<p>Robustness to complex reassortment histories. The graph summarizes results from four datsets where
<italic>minsize</italic>
 = 1,
<italic>maxsize</italic>
 = 20 and
<italic>count</italic>
was varied over the set {1, 2, 5, 10}, testing GiRaF’s robustness to multiple reassortments and complex histories. While the task of identifying the original implants (‘Exact’ results) becomes increasingly intractable, GiRaF’s sensitivity and PPV remain stable under a more relaxed definition of matches (‘Relaxed’ results).</p>
</caption>
<graphic xlink:href="gkq1232f6"></graphic>
</fig>
</p>
</sec>
<sec>
<title>New or sparsely sampled versus old or well-sampled reassortments</title>
<p>The ability to detect reassortment events can depend on the age of the reassortment and the number of sampled isolates that exhibit that reassortment. To probe how the sensitivity of GiRaF depends on the number of isolates exhibiting a particular reassortment, we constructed datasets restricted to single-taxa as well as multi-taxa reassortments using the same procedure as was used in the synthetic dataset experiments above. Our results indicate that the performance of GiRaF is largely unaffected by the size of the reassortment cohort (
<xref ref-type="table" rid="T1">Table 1</xref>
). In fact, GiRaF is slightly better at predicting single-taxa reassortments—a task that is more challenging for manual analysis—compared with identifying larger, typically older events. Because our synthetic trees contain few long branches, the number of taxa in the implanted reassortment is a rough surrogate for the age of the event, and these results suggest that more recent reassortments are slightly more detectable by GiRaF. This may be because larger sets have a greater scope for error and are thus less likely to be identified exactly as a reassortment event.</p>
</sec>
<sec>
<title>Complex reassortment histories</title>
<p>Multiple reassortments in a dataset can make the task of identifying the events challenging and even infeasible in some cases. For example, in instances where new reassortments involve descendents of earlier reassortments, the original reassortment sets can be obscured. This could possibly lead to fragmented predictions by GiRaF. Conversely, two distinct reassortment events with very similar phylogenetic history can be phylogenetically indistinguishable leading to a fused prediction. We studied how sensitivity and specificity are affected as these scenarios become more common by increasing the number of implanted reassortments in the datasets. In terms of identifying the original implants perfectly, GiRaF’s performance decreases gradually as the number of implants increases (
<xref ref-type="fig" rid="F6">Figure 6</xref>
). However, if we accept fragmentation in the predicted sets and apply a relaxed metric for evaluation (see ‘Methods’ section), GiRaF continues to be very precise and sensitive, and its performance remains stable as the number of reassortments is increased (
<xref ref-type="fig" rid="F6">Figure 6</xref>
). This pattern is also seen in terms of sensitivity in predicting reassortant taxa (data not shown). GiRaF’s robustness to multiple reassortment events was also observed in several cases that were manually inspected, where it correctly identified overlapping reassortment events.</p>
</sec>
<sec>
<title>Confidence measures</title>
<p>In addition to candidate reassortment sets, GiRaF also reports a confidence value for each prediction. These confidence values allow the user to choose cutoffs for the appropriate trade-off between sensitivity and precision of predictions, where larger confidence cutoffs lead to a larger fraction of correct predictions (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Figure S3</ext-link>
). Empirically, confidence values reported by GiRaF were surprisingly well calibrated such that the false discovery rate can be estimated as ‘1–confidence value’ (
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Figure S3</ext-link>
).</p>
</sec>
<sec>
<title>Experiments with alternative methods</title>
<p>GiRaF is an extension and refinement of our earlier approach (
<xref ref-type="bibr" rid="B19">19</xref>
) that was purely based on topological features and while quite sensitive was found to have a high false-positive rate. For example, in the case of the Avian (H5N1) and the Holmes
<italic>et al</italic>
. (H3N2) datasets discussed above, the approach in Ref. (
<xref ref-type="bibr" rid="B19">19</xref>
) has perfect sensitivity in identifying known reassortments (as does GiRaF). However, this approach also reports other candidates that are likely to be false positives (1 out of 2 in the Avian set and 6 out of 9 for the set in Holmes
<italic>et al</italic>
.). In the case of the S-OIV dataset, our earlier approach reported 60 candidate reassortments when comparing the HA and NA segments (as opposed to 11 by GiRaF) and while the S-OIV strains were correctly identified, several of the other candidates are likely to be false positives. Finally, on the ‘All events’ dataset analyzed in
<xref ref-type="table" rid="T1">Table 1</xref>
, in comparison to GiRaF our earlier approach has a dramatically lower PPV of 26% and a slightly higher sensitivity at 85%.</p>
<p>We assessed the applicability of methods for recombination detection to the reassortment problem using the RDP program that implements several popular protocols in a user-friendly application (
<xref ref-type="bibr" rid="B15">15</xref>
). As input we provided concatenated alignments for a pair of segments and used default parameters (with the linear sequences option) to analyze the sequences. When applied to the HA and NA segments of the influenza A (H3N2) isolates studied by Holmes
<italic>et al</italic>
., RDP correctly identifies recombination breakpoints close to the segment boundaries (within 70 bp). In addition, RDP identifies 61 taxa as being plausible recombinants of which only 2 match known reassortants, giving an estimated positive predictive value of 3% (sensitivity ≈33%). Similar analysis of the HA and NA segments for the avian influenza dataset discussed above resulted in no breakpoints being identified with default parameters. However, re-analysis without multiple-hypothesis correction identified several putative breakpoints and 32 putative recombinants. Seven of these recombinants have breakpoints close to the segment boundary (within 100 bp) but none match the known reassortant. The high false-positive rate of recombination detection methods is likely due to the difficulty in distinguishing recombinants from their parents, as well as the challenge of simultaneously identifying breakpoints and recombinants.</p>
<p>SplitsTree4 is a widely used package for computing and analyzing phylogenetic networks (
<xref ref-type="bibr" rid="B34">34</xref>
) and, in principle, could help compare segment trees to identify reassortments. To investigate this approach, we provided consensus trees from our datasets and used the Consensus Network algorithm followed by the ReticulateNetwork algorithm in SplitsTree4 to generate a phylogenetic network from the consensus trees. Since it is attempting to reconstruct a complete phylogenetic history, the computational requirements for SplitsTree4 are significant, requiring several gigabytes of memory to analyze many of the datasets. In particular, the 140-taxa collection of swine isolates discussed above exceeded the maximum memory that SplitsTree4 can allocate and could not be run to completion.</p>
<p>SplitsTree4 accurately constructs phylogenetic networks. However, it does not distinguish the reassortant clades from others in the tree. Predicting all clades with two parents as reassortments resulted in a sensitivity of 81% and a PPV of 26% on the ‘All events’ dataset analyzed in
<xref ref-type="table" rid="T1">Table 1</xref>
, and therefore a user would need additional information to identify reassortments with some measure of confidence.</p>
<p>Another approach, proposed in Rabadan
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="B5">5</xref>
), uses a statistical test to identify pairs of taxa whose edit distance varies significantly between segments. This can be indicative of a reassortment event. The method does not, however, detail an automated approach to extract the likely reassortment from the pair. Also, Wan
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="B35">35</xref>
) described a clustering based approach to define influenza genotypes which could then be used to identify reassorted taxa. Other approaches that have been used to predict reassortments include a semi-automated clustering-based approach using strains from different time periods (
<xref ref-type="bibr" rid="B8">8</xref>
) and an approach to infer reassortment networks (
<xref ref-type="bibr" rid="B36">36</xref>
). Since none of these approaches have a publicly available implementation, we were unable to evaluate them further.</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>DISCUSSION</title>
<p>As influenza sequence databases continue to grow, our ability to analyze the sequences and infer evolutionary relationships and dynamics is increasingly becoming a bottleneck. While manual and semi-automatic approaches are quite often regarded as ways to produce ‘gold-standard’ results, they suffer from scalability and reproducibility issues and as we show here can also miss subtle events. The computational pipeline implemented in GiRaF represents an alternative automated approach that enables users to efficiently process large datasets, study all the segments in the influenza genome, and catalog reassortment events with very high precision and sensitivity. Researchers can exploit this capability in several ways. For example, in combination with more intensive surveillance and sequencing of isolates, new reassortments could routinely be flagged for further study. With improved, unbiased sequencing of appropriately sampled isolates, GiRaF could help answer questions related to the rate and distribution (geographical, temporal and segmental biases) of reassortments. GiRaF’s ability to group reassortants into sets and its robustness to complex reassortment histories is likely to play a critical role in such analyses.</p>
<p>While the development of GiRaF focussed on influenza datasets, the algorithms of GiRaF may be useful for the study of other viral datasets as well. In particular, GiRaF’s low false-positive rate (<0.03 in the absence of reassortments) and its ability to report a confidence value may allow it to be combined with a ‘sliding window’ approach to detect recombination breakpoints in large viral datasets (
<xref ref-type="bibr" rid="B37">37</xref>
). The comparison of bacterial gene trees to identify the relatively frequent horizontal transfer events in them is another application area that deserves to be explored. GiRaF’s strength in these areas could be its ability to infer reticulation events while accounting for phylogenetic uncertainty and, in fact, using the full spectrum of phylogenetic information to identify otherwise subtle events with confidence.</p>
</sec>
<sec sec-type="supplementary-material">
<title>SUPPLEMENTARY DATA</title>
<p>
<ext-link ext-link-type="uri" xlink:href="http://nar.oxfordjournals.org/cgi/content/full/gkq1232/DC1">Supplementary Data</ext-link>
are available at NAR Online.</p>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>Supplementary Data</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="supp_39_6_e34__index.html"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="supp_gkq1232_Supplementary_Material.pdf"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="vnd.ms-excel" xlink:href="supp_gkq1232_Supplementary_Table_1.xlsx"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="vnd.ms-excel" xlink:href="supp_gkq1232_Supplementary_Table_2.xlsx"></media>
</supplementary-material>
</sec>
<sec>
<title>FUNDING</title>
<p>
<funding-source>National Science Foundation</funding-source>
(
<award-id>EF-0849899</award-id>
and
<award-id>IIS-0812111</award-id>
to C.K.); the
<funding-source>National Institutes of Health</funding-source>
(
<award-id>1R21AI085376</award-id>
to C.K.). Funding for open access charge: the
<funding-source>National Institutes of Health</funding-source>
(
<award-id>1R21AI085376</award-id>
to C.K.).</p>
<p>
<italic>Conflict of interest statement</italic>
. None declared.</p>
</sec>
</body>
<back>
<ack>
<title>ACKNOWLEDGEMENTS</title>
<p>The authors thank Steven Salzberg for helpful comments on the article.</p>
</ack>
<ref-list>
<title>REFERENCES</title>
<ref id="B1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kawaoka</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Krauss</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Webster</surname>
<given-names>RG</given-names>
</name>
</person-group>
<article-title>Avian-to-human transmission of the PB1 gene of influenza A viruses in the 1957 and 1968 pandemics</article-title>
<source>J. Virol.</source>
<year>1989</year>
<volume>63</volume>
<fpage>4603</fpage>
<lpage>4608</lpage>
<pub-id pub-id-type="pmid">2795713</pub-id>
</element-citation>
</ref>
<ref id="B2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dawood</surname>
<given-names>FS</given-names>
</name>
<name>
<surname>Jain</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Finelli</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Shaw</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Lindstrom</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Garten</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Gubareva</surname>
<given-names>LV</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Bridges</surname>
<given-names>CB</given-names>
</name>
<name>
<surname>Uyeki</surname>
<given-names>TM</given-names>
</name>
</person-group>
<article-title>Emergence of a novel swine-origin influenza A (H1N1) virus in humans</article-title>
<source>N. Engl. J. Med.</source>
<year>2009</year>
<volume>360</volume>
<fpage>2605</fpage>
<lpage>2615</lpage>
<pub-id pub-id-type="pmid">19423869</pub-id>
</element-citation>
</ref>
<ref id="B3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ghedin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Sengamalay</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Shumway</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zaborsky</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Feldblyum</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Subbu</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Spiro</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Sitz</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Koo</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Bolotov</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution</article-title>
<source>Nature</source>
<year>2005</year>
<volume>437</volume>
<fpage>1162</fpage>
<lpage>1166</lpage>
<pub-id pub-id-type="pmid">16208317</pub-id>
</element-citation>
</ref>
<ref id="B4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rambaut</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Pybus</surname>
<given-names>OG</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>MI</given-names>
</name>
<name>
<surname>Viboud</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Taubenberger</surname>
<given-names>JK</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>EC</given-names>
</name>
</person-group>
<article-title>The genomic and epidemiological dynamics of human influenza A virus</article-title>
<source>Nature</source>
<year>2008</year>
<volume>453</volume>
<fpage>615</fpage>
<lpage>619</lpage>
<pub-id pub-id-type="pmid">18418375</pub-id>
</element-citation>
</ref>
<ref id="B5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rabadan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Levine</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Krasnitz</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Non-random reassortment in human influenza A viruses</article-title>
<source>Influenza Other Resp. Viruses</source>
<year>2008</year>
<volume>2</volume>
<fpage>9</fpage>
<lpage>22</lpage>
</element-citation>
</ref>
<ref id="B6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Holmes</surname>
<given-names>EC</given-names>
</name>
<name>
<surname>Ghedin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>St George</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Grenfell</surname>
<given-names>BT</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Fraser</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses</article-title>
<source>PLoS Biol</source>
<year>2005</year>
<volume>3</volume>
<fpage>e300</fpage>
<pub-id pub-id-type="pmid">16026181</pub-id>
</element-citation>
</ref>
<ref id="B7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Kingsford</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Cattoli</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Spiro</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Janies</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Aly</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>IH</given-names>
</name>
<name>
<surname>Couacy-Hymann</surname>
<given-names>E</given-names>
</name>
<name>
<surname>De Mia</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Dung do</surname>
<given-names>H</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genome analysis linking recent European and African influenza (H5N1) viruses</article-title>
<source>Emerg. Infect. Dis.</source>
<year>2007</year>
<volume>13</volume>
<fpage>713</fpage>
<lpage>718</lpage>
<pub-id pub-id-type="pmid">17553249</pub-id>
</element-citation>
</ref>
<ref id="B8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Macken</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Webby</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Bruno</surname>
<given-names>WJ</given-names>
</name>
</person-group>
<article-title>Genotype turnover by reassortment of replication complex genes from avian influenza A virus</article-title>
<source>J. Gen. Virol.</source>
<year>2006</year>
<volume>87</volume>
<fpage>2803</fpage>
<lpage>2815</lpage>
<pub-id pub-id-type="pmid">16963738</pub-id>
</element-citation>
</ref>
<ref id="B9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nelson</surname>
<given-names>MI</given-names>
</name>
<name>
<surname>Viboud</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Simonsen</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Bennett</surname>
<given-names>RT</given-names>
</name>
<name>
<surname>Griesemer</surname>
<given-names>SB</given-names>
</name>
<name>
<surname>St George</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Spiro</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Sengamalay</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Ghedin</surname>
<given-names>E</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Multiple reassortment events in the evolutionary history of H1N1 influenza A virus since 1918</article-title>
<source>PLoS Pathog.</source>
<year>2008</year>
<volume>4</volume>
<fpage>e1000012</fpage>
<pub-id pub-id-type="pmid">18463694</pub-id>
</element-citation>
</ref>
<ref id="B10">
<label>10</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Klopper</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Lockhart</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Steel</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Reconstruction of reticulate networks from gene trees</article-title>
<source>Research in Computational Molecular Biology</source>
<year>2005</year>
<publisher-loc>Berlin/Heidelberg</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>233</fpage>
<lpage>249</lpage>
</element-citation>
</ref>
<ref id="B11">
<label>11</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Klopper</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Beyond galled trees - decomposition and computation of galled networks</article-title>
<source>Research in Computational Molecular Biology</source>
<year>2007</year>
<publisher-loc>Berlin/Heidelberg</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>211</fpage>
<lpage>225</lpage>
</element-citation>
</ref>
<ref id="B12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Padidam</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sawyer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fauquet</surname>
<given-names>CM</given-names>
</name>
</person-group>
<article-title>Possible emergence of new geminiviruses by frequent recombination</article-title>
<source>Virology</source>
<year>1999</year>
<volume>265</volume>
<fpage>218</fpage>
<lpage>225</lpage>
<pub-id pub-id-type="pmid">10600594</pub-id>
</element-citation>
</ref>
<ref id="B13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martin</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Posada</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Crandall</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Williamson</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints</article-title>
<source>AIDS Res. Hum. Retroviruses</source>
<year>2005</year>
<volume>21</volume>
<fpage>98</fpage>
<lpage>102</lpage>
<pub-id pub-id-type="pmid">15665649</pub-id>
</element-citation>
</ref>
<ref id="B14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Posada</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Crandall</surname>
<given-names>KA</given-names>
</name>
</person-group>
<article-title>Evaluation of methods for detecting recombination from DNA sequences: computer simulations</article-title>
<source>Proc. Natl Acad. Sci. USA</source>
<year>2001</year>
<volume>98</volume>
<fpage>13757</fpage>
<lpage>13762</lpage>
<pub-id pub-id-type="pmid">11717435</pub-id>
</element-citation>
</ref>
<ref id="B15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Martin</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Williamson</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Posada</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>RDP2: recombination detection and analysis from sequence alignments</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>260</fpage>
<lpage>262</lpage>
<pub-id pub-id-type="pmid">15377507</pub-id>
</element-citation>
</ref>
<ref id="B16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Planet</surname>
<given-names>PJ</given-names>
</name>
</person-group>
<article-title>Tree disagreement: measuring and testing incongruence in phylogenies</article-title>
<source>J. Biomed. Inform.</source>
<year>2006</year>
<volume>39</volume>
<fpage>86</fpage>
<lpage>102</lpage>
<pub-id pub-id-type="pmid">16243006</pub-id>
</element-citation>
</ref>
<ref id="B17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mickevich</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Farris</surname>
<given-names>JS</given-names>
</name>
</person-group>
<article-title>The implications of incongruence in Menidia</article-title>
<source>Syst. Zool.</source>
<year>1981</year>
<volume>30</volume>
<fpage>351</fpage>
<lpage>370</lpage>
</element-citation>
</ref>
<ref id="B18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kishino</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hasegawa</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea</article-title>
<source>J. Mol. Evol.</source>
<year>1989</year>
<volume>29</volume>
<fpage>170</fpage>
<lpage>179</lpage>
<pub-id pub-id-type="pmid">2509717</pub-id>
</element-citation>
</ref>
<ref id="B19">
<label>19</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Nagarajan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Kingsford</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Uncovering genomic reassortments among Influenza strains by enumerating maximal bicliques</article-title>
<source>Proceedings of the 2008 IEEE International Conference on Bioinformatics and Biomedicine</source>
<year>2008</year>
<publisher-loc>Washington DC</publisher-loc>
<publisher-name>IEEE Computer Society</publisher-name>
</element-citation>
</ref>
<ref id="B20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Godzik</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>1658</fpage>
<lpage>1659</lpage>
<pub-id pub-id-type="pmid">16731699</pub-id>
</element-citation>
</ref>
<ref id="B21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kingsford</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Nagarajan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>2009 Swine-origin influenza A (H1N1) resembles previous influenza isolates</article-title>
<source>PLoS ONE</source>
<year>2009</year>
<volume>4</volume>
<fpage>e6402</fpage>
<pub-id pub-id-type="pmid">19636415</pub-id>
</element-citation>
</ref>
<ref id="B22">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rambaut</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Grassly</surname>
<given-names>NC</given-names>
</name>
</person-group>
<article-title>Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees</article-title>
<source>Comput. Appl. Biosci.</source>
<year>1997</year>
<volume>13</volume>
<fpage>235</fpage>
<lpage>238</lpage>
<pub-id pub-id-type="pmid">9183526</pub-id>
</element-citation>
</ref>
<ref id="B23">
<label>23</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Wilgenbusch</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Swofford</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Inferring evolutionary trees with PAUP*</article-title>
<source>Current Protocols in Bioinformatics</source>
<year>2003</year>
<publisher-loc>Malden, MA</publisher-loc>
<publisher-name>John Wiley & Sons, Inc.</publisher-name>
<comment>Chapter 6, Unit 64</comment>
</element-citation>
</ref>
<ref id="B24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edgar</surname>
<given-names>RC</given-names>
</name>
</person-group>
<article-title>MUSCLE: a multiple sequence alignment method with reduced time and space complexity</article-title>
<source>BMC Bioinformatics</source>
<year>2004</year>
<volume>5</volume>
<fpage>113</fpage>
<pub-id pub-id-type="pmid">15318951</pub-id>
</element-citation>
</ref>
<ref id="B25">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huelsenbeck</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Ronquist</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>MRBAYES: Bayesian inference of phylogenetic trees</article-title>
<source>Bioinformatics</source>
<year>2001</year>
<volume>17</volume>
<fpage>754</fpage>
<lpage>755</lpage>
<pub-id pub-id-type="pmid">11524383</pub-id>
</element-citation>
</ref>
<ref id="B26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Drummond</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Rambaut</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>BEAST: Bayesian evolutionary analysis by sampling trees</article-title>
<source>BMC Evol. Biol.</source>
<year>2007</year>
<volume>7</volume>
<fpage>214</fpage>
<pub-id pub-id-type="pmid">17996036</pub-id>
</element-citation>
</ref>
<ref id="B27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jinyan</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Guimei</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Haiquan</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Limsoon</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Maximal biclique subgraphs and closed pattern pairs of the adjacency matrix: a one-to-one correspondence and mining algorithms</article-title>
<source>IEEE Trans. Knowl. Data Eng.</source>
<year>2007</year>
<volume>19</volume>
<fpage>1625</fpage>
<lpage>1637</lpage>
</element-citation>
</ref>
<ref id="B28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gabriela</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sorin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Yves</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Stephan</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Peter</surname>
<given-names>LH</given-names>
</name>
<name>
<surname>Bruno</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Consensus algorithms for the generation of all maximal bicliques</article-title>
<source>Discrete Appl. Math.</source>
<year>2004</year>
<volume>145</volume>
<fpage>11</fpage>
<lpage>21</lpage>
</element-citation>
</ref>
<ref id="B29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gray</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>McCarthy</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Capuano</surname>
<given-names>AW</given-names>
</name>
<name>
<surname>Setterquist</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Olsen</surname>
<given-names>CW</given-names>
</name>
<name>
<surname>Alavanja</surname>
<given-names>MC</given-names>
</name>
</person-group>
<article-title>Swine workers and swine influenza virus infections</article-title>
<source>Emerg. Infect. Dis.</source>
<year>2007</year>
<volume>13</volume>
<fpage>1871</fpage>
<lpage>1878</lpage>
<pub-id pub-id-type="pmid">18258038</pub-id>
</element-citation>
</ref>
<ref id="B30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Obenauer</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Denson</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mehta</surname>
<given-names>PK</given-names>
</name>
<name>
<surname>Su</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Mukatira</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Finkelstein</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Large-scale sequence analysis of avian influenza isolates</article-title>
<source>Science</source>
<year>2006</year>
<volume>311</volume>
<fpage>1576</fpage>
<lpage>1580</lpage>
<pub-id pub-id-type="pmid">16439620</pub-id>
</element-citation>
</ref>
<ref id="B31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Takemae</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Parchariyanon</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Damrongwatanapokin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Uchida</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Ruttanapumma</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Watanabe</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Yamaguchi</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Saito</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Genetic diversity of swine influenza viruses isolated from pigs during 2000 to 2005 in Thailand</article-title>
<source>Influenza Other Resp. Viruses</source>
<year>2008</year>
<volume>2</volume>
<fpage>181</fpage>
<lpage>189</lpage>
</element-citation>
</ref>
<ref id="B32">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chutinimitkul</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Thippamom</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Damrongwatanapokin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Payungporn</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Thanawongnuwech</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Amonsin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Boonsuk</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Sreta</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Bunpong</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Tantilertcharoen</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genetic characterization of H1N1, H1N2 and H3N2 swine influenza virus in Thailand</article-title>
<source>Arch. Virol.</source>
<year>2008</year>
<volume>153</volume>
<fpage>1049</fpage>
<lpage>1056</lpage>
<pub-id pub-id-type="pmid">18458812</pub-id>
</element-citation>
</ref>
<ref id="B33">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khiabanian</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Trifonov</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Rabadan</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Reassortment patterns in Swine influenza viruses</article-title>
<source>PLoS ONE</source>
<year>2009</year>
<volume>4</volume>
<fpage>e7366</fpage>
<pub-id pub-id-type="pmid">19809504</pub-id>
</element-citation>
</ref>
<ref id="B34">
<label>34</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
</person-group>
<article-title>SplitsTree: analyzing and visualizing evolutionary data</article-title>
<source>Bioinformatics</source>
<year>1998</year>
<volume>14</volume>
<fpage>68</fpage>
<lpage>73</lpage>
<pub-id pub-id-type="pmid">9520503</pub-id>
</element-citation>
</ref>
<ref id="B35">
<label>35</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wan</surname>
<given-names>XF</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Emch</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Donis</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>A quantitative genotype algorithm reflecting H5N1 Avian influenza niches</article-title>
<source>Bioinformatics</source>
<year>2007</year>
<volume>23</volume>
<fpage>2368</fpage>
<lpage>2375</lpage>
<pub-id pub-id-type="pmid">17623701</pub-id>
</element-citation>
</ref>
<ref id="B36">
<label>36</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bokhari</surname>
<given-names>SH</given-names>
</name>
<name>
<surname>Janies</surname>
<given-names>DA</given-names>
</name>
</person-group>
<article-title>Reassortment networks for investigating the evolution of segmented viruses</article-title>
<source>IEEE/ACM Trans. Comput. Biol. Bioinform.</source>
<volume>7</volume>
<fpage>288</fpage>
<lpage>298</lpage>
<pub-id pub-id-type="pmid">20431148</pub-id>
</element-citation>
</ref>
<ref id="B37">
<label>37</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paraskevis</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Deforche</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Lemey</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Magiorkinis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hatzakis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vandamme</surname>
<given-names>AM</given-names>
</name>
</person-group>
<article-title>SlidingBayes: exploring recombination using a sliding window approach based on Bayesian phylogenetic inference</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>1274</fpage>
<lpage>1275</lpage>
<pub-id pub-id-type="pmid">15546940</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/H2N2V1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A33 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000A33 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    H2N2V1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3064795
   |texte=   GiRaF: robust, computational identification of influenza reassortments via graph mining
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:21177643" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a H2N2V1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 14 19:59:40 2020. Site generation: Thu Mar 25 15:38:26 2021