Serveur d'exploration H2N2

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

FluShuffle and FluResort: new algorithms to identify reassorted strains of the influenza virus by mass spectrometry

Identifieur interne : 000519 ( Pmc/Corpus ); précédent : 000518; suivant : 000520

FluShuffle and FluResort: new algorithms to identify reassorted strains of the influenza virus by mass spectrometry

Auteurs : Aaron Tl Lun ; Jason Wh Wong ; Kevin M. Downard

Source :

RBID : PMC:3505172

Abstract

Background

Influenza is one of the oldest and deadliest infectious diseases known to man. Reassorted strains of the virus pose the greatest risk to both human and animal health and have been associated with all pandemics of the past century, with the possible exception of the 1918 pandemic, resulting in tens of millions of deaths. We have developed and tested new computer algorithms, FluShuffle and FluResort, which enable reassorted viruses to be identified by the most rapid and direct means possible. These algorithms enable reassorted influenza, and other, viruses to be rapidly identified to allow prevention strategies and treatments to be more efficiently implemented.

Results

The FluShuffle and FluResort algorithms were tested with both experimental and simulated mass spectra of whole virus digests. FluShuffle considers different combinations of viral protein identities that match the mass spectral data using a Gibbs sampling algorithm employing a mixed protein Markov chain Monte Carlo (MCMC) method. FluResort utilizes those identities to calculate the weighted distance of each across two or more different phylogenetic trees constructed through viral protein sequence alignments. Each weighted mean distance value is normalized by conversion to a Z-score to establish a reassorted strain.

Conclusions

The new FluShuffle and FluResort algorithms can correctly identify the origins of influenza viral proteins and the number of reassortment events required to produce the strains from the high resolution mass spectral data of whole virus proteolytic digestions. This has been demonstrated in the case of constructed vaccine strains as well as common human seasonal strains of the virus. The algorithms significantly improve the capability of the proteotyping approach to identify reassorted viruses that pose the greatest pandemic risk.


Url:
DOI: 10.1186/1471-2105-13-208
PubMed: 22906155
PubMed Central: 3505172

Links to Exploration step

PMC:3505172

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">FluShuffle and FluResort: new algorithms to identify reassorted strains of the influenza virus by mass spectrometry</title>
<author>
<name sortKey="Lun, Aaron Tl" sort="Lun, Aaron Tl" uniqKey="Lun A" first="Aaron Tl" last="Lun">Aaron Tl Lun</name>
<affiliation>
<nlm:aff id="I1">School of Molecular Bioscience G-08, The University of Sydney, Sydney, NSW, 2006, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wong, Jason Wh" sort="Wong, Jason Wh" uniqKey="Wong J" first="Jason Wh" last="Wong">Jason Wh Wong</name>
<affiliation>
<nlm:aff id="I1">School of Molecular Bioscience G-08, The University of Sydney, Sydney, NSW, 2006, Australia</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Prince of Wales Clinical School and Lowy Cancer Research Centre, University of New South Wales, Sydney, NSW, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Downard, Kevin M" sort="Downard, Kevin M" uniqKey="Downard K" first="Kevin M" last="Downard">Kevin M. Downard</name>
<affiliation>
<nlm:aff id="I1">School of Molecular Bioscience G-08, The University of Sydney, Sydney, NSW, 2006, Australia</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22906155</idno>
<idno type="pmc">3505172</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505172</idno>
<idno type="RBID">PMC:3505172</idno>
<idno type="doi">10.1186/1471-2105-13-208</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000519</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000519</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">FluShuffle and FluResort: new algorithms to identify reassorted strains of the influenza virus by mass spectrometry</title>
<author>
<name sortKey="Lun, Aaron Tl" sort="Lun, Aaron Tl" uniqKey="Lun A" first="Aaron Tl" last="Lun">Aaron Tl Lun</name>
<affiliation>
<nlm:aff id="I1">School of Molecular Bioscience G-08, The University of Sydney, Sydney, NSW, 2006, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wong, Jason Wh" sort="Wong, Jason Wh" uniqKey="Wong J" first="Jason Wh" last="Wong">Jason Wh Wong</name>
<affiliation>
<nlm:aff id="I1">School of Molecular Bioscience G-08, The University of Sydney, Sydney, NSW, 2006, Australia</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Prince of Wales Clinical School and Lowy Cancer Research Centre, University of New South Wales, Sydney, NSW, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Downard, Kevin M" sort="Downard, Kevin M" uniqKey="Downard K" first="Kevin M" last="Downard">Kevin M. Downard</name>
<affiliation>
<nlm:aff id="I1">School of Molecular Bioscience G-08, The University of Sydney, Sydney, NSW, 2006, Australia</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Influenza is one of the oldest and deadliest infectious diseases known to man. Reassorted strains of the virus pose the greatest risk to both human and animal health and have been associated with all pandemics of the past century, with the possible exception of the 1918 pandemic, resulting in tens of millions of deaths. We have developed and tested new computer algorithms, FluShuffle and FluResort, which enable reassorted viruses to be identified by the most rapid and direct means possible. These algorithms enable reassorted influenza, and other, viruses to be rapidly identified to allow prevention strategies and treatments to be more efficiently implemented.</p>
</sec>
<sec>
<title>Results</title>
<p>The FluShuffle and FluResort algorithms were tested with both experimental and simulated mass spectra of whole virus digests. FluShuffle considers different combinations of viral protein identities that match the mass spectral data using a Gibbs sampling algorithm employing a mixed protein Markov chain Monte Carlo (MCMC) method. FluResort utilizes those identities to calculate the weighted distance of each across two or more different phylogenetic trees constructed through viral protein sequence alignments. Each weighted mean distance value is normalized by conversion to a Z-score to establish a reassorted strain.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>The new FluShuffle and FluResort algorithms can correctly identify the origins of influenza viral proteins and the number of reassortment events required to produce the strains from the high resolution mass spectral data of whole virus proteolytic digestions. This has been demonstrated in the case of constructed vaccine strains as well as common human seasonal strains of the virus. The algorithms significantly improve the capability of the proteotyping approach to identify reassorted viruses that pose the greatest pandemic risk.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Tam, J" uniqKey="Van Tam J">J Van-Tam</name>
</author>
<author>
<name sortKey="Sellwood, C" uniqKey="Sellwood C">C Sellwood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nelson, Mi" uniqKey="Nelson M">MI Nelson</name>
</author>
<author>
<name sortKey="Holmes, Ec" uniqKey="Holmes E">EC Holmes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zambon, Mc" uniqKey="Zambon M">MC Zambon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nguyen Van Tam, Js" uniqKey="Nguyen Van Tam J">JS Nguyen-Van-Tam</name>
</author>
<author>
<name sortKey="Hampson, Aw" uniqKey="Hampson A">AW Hampson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schweiger, B" uniqKey="Schweiger B">B Schweiger</name>
</author>
<author>
<name sortKey="Bruns, L" uniqKey="Bruns L">L Bruns</name>
</author>
<author>
<name sortKey="Meixenberger, M" uniqKey="Meixenberger M">M Meixenberger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nelson, Mi" uniqKey="Nelson M">MI Nelson</name>
</author>
<author>
<name sortKey="Viboud, C" uniqKey="Viboud C">C Viboud</name>
</author>
<author>
<name sortKey="Simonsen, L" uniqKey="Simonsen L">L Simonsen</name>
</author>
<author>
<name sortKey="Bennett, Rt" uniqKey="Bennett R">RT Bennett</name>
</author>
<author>
<name sortKey="Griesemer, Sb" uniqKey="Griesemer S">SB Griesemer</name>
</author>
<author>
<name sortKey="St George, K" uniqKey="St George K">K St George</name>
</author>
<author>
<name sortKey="Taylor, J" uniqKey="Taylor J">J Taylor</name>
</author>
<author>
<name sortKey="Spiro, Dj" uniqKey="Spiro D">DJ Spiro</name>
</author>
<author>
<name sortKey="Sengamalay, Na" uniqKey="Sengamalay N">NA Sengamalay</name>
</author>
<author>
<name sortKey="Ghedin, E" uniqKey="Ghedin E">E Ghedin</name>
</author>
<author>
<name sortKey="Taubenberger, Jk" uniqKey="Taubenberger J">JK Taubenberger</name>
</author>
<author>
<name sortKey="Holmes, Ec" uniqKey="Holmes E">EC Holmes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yassine, Hm" uniqKey="Yassine H">HM Yassine</name>
</author>
<author>
<name sortKey="Lee, Cw" uniqKey="Lee C">CW Lee</name>
</author>
<author>
<name sortKey="Gourapura, R" uniqKey="Gourapura R">R Gourapura</name>
</author>
<author>
<name sortKey="Saif, Ym" uniqKey="Saif Y">YM Saif</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kilbourne, Ed" uniqKey="Kilbourne E">ED Kilbourne</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sch Fer, Jr" uniqKey="Sch Fer J">JR Schäfer</name>
</author>
<author>
<name sortKey="Kawaoka, Y" uniqKey="Kawaoka Y">Y Kawaoka</name>
</author>
<author>
<name sortKey="Bean, Wj" uniqKey="Bean W">WJ Bean</name>
</author>
<author>
<name sortKey="Suss, J" uniqKey="Suss J">J Süss</name>
</author>
<author>
<name sortKey="Senne, D" uniqKey="Senne D">D Senne</name>
</author>
<author>
<name sortKey="Webster, Rg" uniqKey="Webster R">RG Webster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fang, R" uniqKey="Fang R">R Fang</name>
</author>
<author>
<name sortKey="Min Jou, W" uniqKey="Min Jou W">W Min Jou</name>
</author>
<author>
<name sortKey="Huylebroeck, D" uniqKey="Huylebroeck D">D Huylebroeck</name>
</author>
<author>
<name sortKey="Devos, R" uniqKey="Devos R">R Devos</name>
</author>
<author>
<name sortKey="Fiers, W" uniqKey="Fiers W">W Fiers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, Gj" uniqKey="Smith G">GJ Smith</name>
</author>
<author>
<name sortKey="Vijaykrishna, D" uniqKey="Vijaykrishna D">D Vijaykrishna</name>
</author>
<author>
<name sortKey="Bahl, J" uniqKey="Bahl J">J Bahl</name>
</author>
<author>
<name sortKey="Lycett, Sj" uniqKey="Lycett S">SJ Lycett</name>
</author>
<author>
<name sortKey="Worobey, M" uniqKey="Worobey M">M Worobey</name>
</author>
<author>
<name sortKey="Pybus, Og" uniqKey="Pybus O">OG Pybus</name>
</author>
<author>
<name sortKey="Ma, Sk" uniqKey="Ma S">SK Ma</name>
</author>
<author>
<name sortKey="Cheung, Cl" uniqKey="Cheung C">CL Cheung</name>
</author>
<author>
<name sortKey="Raghwani, J" uniqKey="Raghwani J">J Raghwani</name>
</author>
<author>
<name sortKey="Bhatt, S" uniqKey="Bhatt S">S Bhatt</name>
</author>
<author>
<name sortKey="Peiris, Js" uniqKey="Peiris J">JS Peiris</name>
</author>
<author>
<name sortKey="Guan, Y" uniqKey="Guan Y">Y Guan</name>
</author>
<author>
<name sortKey="Rambaut, A" uniqKey="Rambaut A">A Rambaut</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Webster, Rg" uniqKey="Webster R">RG Webster</name>
</author>
<author>
<name sortKey="Bean, Wj" uniqKey="Bean W">WJ Bean</name>
</author>
<author>
<name sortKey="Gorman, Ot" uniqKey="Gorman O">OT Gorman</name>
</author>
<author>
<name sortKey="Chambers, Tm" uniqKey="Chambers T">TM Chambers</name>
</author>
<author>
<name sortKey="Kawaoka, Y" uniqKey="Kawaoka Y">Y Kawaoka</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, R" uniqKey="Wang R">R Wang</name>
</author>
<author>
<name sortKey="Taubenberger, Jk" uniqKey="Taubenberger J">JK Taubenberger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thompson, Jd" uniqKey="Thompson J">JD Thompson</name>
</author>
<author>
<name sortKey="Higgins, Dg" uniqKey="Higgins D">DG Higgins</name>
</author>
<author>
<name sortKey="Gibson, Tj" uniqKey="Gibson T">TJ Gibson</name>
</author>
<author>
<name sortKey="Clustal, W" uniqKey="Clustal W">W CLUSTAL</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Djikeng, A" uniqKey="Djikeng A">A Djikeng</name>
</author>
<author>
<name sortKey="Spiro, D" uniqKey="Spiro D">D Spiro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Elden, Lj" uniqKey="Van Elden L">LJ van Elden</name>
</author>
<author>
<name sortKey="Nijhuis, M" uniqKey="Nijhuis M">M Nijhuis</name>
</author>
<author>
<name sortKey="Schipper, P" uniqKey="Schipper P">P Schipper</name>
</author>
<author>
<name sortKey="Schuurman, R" uniqKey="Schuurman R">R Schuurman</name>
</author>
<author>
<name sortKey="Van Loon, Am" uniqKey="Van Loon A">AM van Loon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Coiras, Mt" uniqKey="Coiras M">MT Coiras</name>
</author>
<author>
<name sortKey="Perez Bre A, P" uniqKey="Perez Bre A P">P Pérez-Breña</name>
</author>
<author>
<name sortKey="Garcia, Ml" uniqKey="Garcia M">ML García</name>
</author>
<author>
<name sortKey="Casas, I" uniqKey="Casas I">I Casas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Farris, Js" uniqKey="Farris J">JS Farris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yurovsky, A" uniqKey="Yurovsky A">A Yurovsky</name>
</author>
<author>
<name sortKey="Moret, Bm" uniqKey="Moret B">BM Moret</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rabadan, R" uniqKey="Rabadan R">R Rabadan</name>
</author>
<author>
<name sortKey="Levine, Aj" uniqKey="Levine A">AJ Levine</name>
</author>
<author>
<name sortKey="Krasnitz, M" uniqKey="Krasnitz M">M Krasnitz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nagarajan, N" uniqKey="Nagarajan N">N Nagarajan</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wan, Xf" uniqKey="Wan X">XF Wan</name>
</author>
<author>
<name sortKey="Wu, X" uniqKey="Wu X">X Wu</name>
</author>
<author>
<name sortKey="Lin, G" uniqKey="Lin G">G Lin</name>
</author>
<author>
<name sortKey="Holton, Sb" uniqKey="Holton S">SB Holton</name>
</author>
<author>
<name sortKey="Desmone, Ra" uniqKey="Desmone R">RA Desmone</name>
</author>
<author>
<name sortKey="Shyu, Cr" uniqKey="Shyu C">CR Shyu</name>
</author>
<author>
<name sortKey="Guan, Y" uniqKey="Guan Y">Y Guan</name>
</author>
<author>
<name sortKey="Emch, Me" uniqKey="Emch M">ME Emch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Suzuki, Y" uniqKey="Suzuki Y">Y Suzuki</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bokhari, Sh" uniqKey="Bokhari S">SH Bokhari</name>
</author>
<author>
<name sortKey="Janies, Da" uniqKey="Janies D">DA Janies</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gupta, Rs" uniqKey="Gupta R">RS Gupta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwahn, Ab" uniqKey="Schwahn A">AB Schwahn</name>
</author>
<author>
<name sortKey="Wong, Jw" uniqKey="Wong J">JW Wong</name>
</author>
<author>
<name sortKey="Downard, Km" uniqKey="Downard K">KM Downard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwahn, Ab" uniqKey="Schwahn A">AB Schwahn</name>
</author>
<author>
<name sortKey="Wong, Jw" uniqKey="Wong J">JW Wong</name>
</author>
<author>
<name sortKey="Downard, Km" uniqKey="Downard K">KM Downard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwahn, Ab" uniqKey="Schwahn A">AB Schwahn</name>
</author>
<author>
<name sortKey="Wong, Jw" uniqKey="Wong J">JW Wong</name>
</author>
<author>
<name sortKey="Downard, Km" uniqKey="Downard K">KM Downard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwahn, Ab" uniqKey="Schwahn A">AB Schwahn</name>
</author>
<author>
<name sortKey="Wong, Jw" uniqKey="Wong J">JW Wong</name>
</author>
<author>
<name sortKey="Downard, Km" uniqKey="Downard K">KM Downard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwahn, Ab" uniqKey="Schwahn A">AB Schwahn</name>
</author>
<author>
<name sortKey="Downard, Km" uniqKey="Downard K">KM Downard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wong, Jw" uniqKey="Wong J">JW Wong</name>
</author>
<author>
<name sortKey="Schwahn, Ab" uniqKey="Schwahn A">AB Schwahn</name>
</author>
<author>
<name sortKey="Downard, Km" uniqKey="Downard K">KM Downard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwahn, Ab" uniqKey="Schwahn A">AB Schwahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ha, Jw" uniqKey="Ha J">JW Ha</name>
</author>
<author>
<name sortKey="Schwahn, Ab" uniqKey="Schwahn A">AB Schwahn</name>
</author>
<author>
<name sortKey="Downard, Km" uniqKey="Downard K">KM Downard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ha, Jw" uniqKey="Ha J">JW Ha</name>
</author>
<author>
<name sortKey="Downard, Km" uniqKey="Downard K">KM Downard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bao, Y" uniqKey="Bao Y">Y Bao</name>
</author>
<author>
<name sortKey="Bolotov, P" uniqKey="Bolotov P">P Bolotov</name>
</author>
<author>
<name sortKey="Dernovoy, D" uniqKey="Dernovoy D">D Dernovoy</name>
</author>
<author>
<name sortKey="Kiryutin, B" uniqKey="Kiryutin B">B Kiryutin</name>
</author>
<author>
<name sortKey="Zaslavsky, L" uniqKey="Zaslavsky L">L Zaslavsky</name>
</author>
<author>
<name sortKey="Tatusova, T" uniqKey="Tatusova T">T Tatusova</name>
</author>
<author>
<name sortKey="Ostell, J" uniqKey="Ostell J">J Ostell</name>
</author>
<author>
<name sortKey="Lipman, D" uniqKey="Lipman D">D Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="He, Z" uniqKey="He Z">Z He</name>
</author>
<author>
<name sortKey="Yang, C" uniqKey="Yang C">C Yang</name>
</author>
<author>
<name sortKey="Yang, C" uniqKey="Yang C">C Yang</name>
</author>
<author>
<name sortKey="Qi, Rz" uniqKey="Qi R">RZ Qi</name>
</author>
<author>
<name sortKey="Tam, Jp" uniqKey="Tam J">JP Tam</name>
</author>
<author>
<name sortKey="Yu, W" uniqKey="Yu W">W Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Flegal, Jm" uniqKey="Flegal J">JM Flegal</name>
</author>
<author>
<name sortKey="Haran, M" uniqKey="Haran M">M Haran</name>
</author>
<author>
<name sortKey="Jones, Gl" uniqKey="Jones G">GL Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, W" uniqKey="Zhang W">W Zhang</name>
</author>
<author>
<name sortKey="Chait, Bt" uniqKey="Chait B">BT Chait</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcdonald, Wh" uniqKey="Mcdonald W">WH McDonald</name>
</author>
<author>
<name sortKey="Tabb, Dl" uniqKey="Tabb D">DL Tabb</name>
</author>
<author>
<name sortKey="Sadygov, Rg" uniqKey="Sadygov R">RG Sadygov</name>
</author>
<author>
<name sortKey="Maccoss, Mj" uniqKey="Maccoss M">MJ MacCoss</name>
</author>
<author>
<name sortKey="Venable, J" uniqKey="Venable J">J Venable</name>
</author>
<author>
<name sortKey="Graumann, J" uniqKey="Graumann J">J Graumann</name>
</author>
<author>
<name sortKey="Johnson, Jr" uniqKey="Johnson J">JR Johnson</name>
</author>
<author>
<name sortKey="Cociorva, D" uniqKey="Cociorva D">D Cociorva</name>
</author>
<author>
<name sortKey="Yates, Jr" uniqKey="Yates J">JR Yates</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoopmann, Mr" uniqKey="Hoopmann M">MR Hoopmann</name>
</author>
<author>
<name sortKey="Finney, Gl" uniqKey="Finney G">GL Finney</name>
</author>
<author>
<name sortKey="Maccoss, Mj" uniqKey="Maccoss M">MJ MacCoss</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Price, M" uniqKey="Price M">M Price</name>
</author>
<author>
<name sortKey="Dehal, P" uniqKey="Dehal P">P Dehal</name>
</author>
<author>
<name sortKey="Arkin, A" uniqKey="Arkin A">A Arkin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Rc" uniqKey="Edgar R">RC Edgar</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="product-review" xml:lang="en">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22906155</article-id>
<article-id pub-id-type="pmc">3505172</article-id>
<article-id pub-id-type="publisher-id">1471-2105-13-208</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-13-208</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Software</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>FluShuffle and FluResort: new algorithms to identify reassorted strains of the influenza virus by mass spectrometry</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" id="A1">
<name>
<surname>Lun</surname>
<given-names>Aaron TL</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>alun1181@uni.sydney.edu.au</email>
</contrib>
<contrib contrib-type="author" id="A2">
<name>
<surname>Wong</surname>
<given-names>Jason WH</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>jason.wong@unsw.edu.au</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A3">
<name>
<surname>Downard</surname>
<given-names>Kevin M</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>k.downard@sydney.edu.au</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
School of Molecular Bioscience G-08, The University of Sydney, Sydney, NSW, 2006, Australia</aff>
<aff id="I2">
<label>2</label>
Prince of Wales Clinical School and Lowy Cancer Research Centre, University of New South Wales, Sydney, NSW, Australia</aff>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>20</day>
<month>8</month>
<year>2012</year>
</pub-date>
<volume>13</volume>
<fpage>208</fpage>
<lpage>208</lpage>
<history>
<date date-type="received">
<day>30</day>
<month>3</month>
<year>2012</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>8</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright ©2012 Lun et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2012</copyright-year>
<copyright-holder>Lun et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2105/13/208"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>Influenza is one of the oldest and deadliest infectious diseases known to man. Reassorted strains of the virus pose the greatest risk to both human and animal health and have been associated with all pandemics of the past century, with the possible exception of the 1918 pandemic, resulting in tens of millions of deaths. We have developed and tested new computer algorithms, FluShuffle and FluResort, which enable reassorted viruses to be identified by the most rapid and direct means possible. These algorithms enable reassorted influenza, and other, viruses to be rapidly identified to allow prevention strategies and treatments to be more efficiently implemented.</p>
</sec>
<sec>
<title>Results</title>
<p>The FluShuffle and FluResort algorithms were tested with both experimental and simulated mass spectra of whole virus digests. FluShuffle considers different combinations of viral protein identities that match the mass spectral data using a Gibbs sampling algorithm employing a mixed protein Markov chain Monte Carlo (MCMC) method. FluResort utilizes those identities to calculate the weighted distance of each across two or more different phylogenetic trees constructed through viral protein sequence alignments. Each weighted mean distance value is normalized by conversion to a Z-score to establish a reassorted strain.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>The new FluShuffle and FluResort algorithms can correctly identify the origins of influenza viral proteins and the number of reassortment events required to produce the strains from the high resolution mass spectral data of whole virus proteolytic digestions. This has been demonstrated in the case of constructed vaccine strains as well as common human seasonal strains of the virus. The algorithms significantly improve the capability of the proteotyping approach to identify reassorted viruses that pose the greatest pandemic risk.</p>
</sec>
</abstract>
<kwd-group>
<kwd>Influenza virus</kwd>
<kwd>Reassortment</kwd>
<kwd>Proteotyping</kwd>
<kwd>Computer algorithm</kwd>
<kwd>Phylogenetics</kwd>
<kwd>Mass spectrometry</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>Influenza is one of the oldest and deadliest infectious diseases known to man. Seasonal human influenza epidemics are responsible for over 250,000 deaths worldwide and over 3 million cases of severe illness each year [
<xref ref-type="bibr" rid="B1">1</xref>
-
<xref ref-type="bibr" rid="B3">3</xref>
]. When a host is simultaneously infected with two or more strains derived from different animal species, reassortment events can occur producing progeny viruses that contain genes derived from two or more parent strains. This significantly changes a virus’ antigenic profile. It poses serious epidemiological consequences [
<xref ref-type="bibr" rid="B4">4</xref>
,
<xref ref-type="bibr" rid="B5">5</xref>
] due to a lack of host immunity against such novel strains particularly when one of the parent strains has been derived from an animal host, usually avian or swine [
<xref ref-type="bibr" rid="B6">6</xref>
-
<xref ref-type="bibr" rid="B8">8</xref>
].</p>
<p>Of the four influenza pandemics of the past century [
<xref ref-type="bibr" rid="B9">9</xref>
], at least three have been shown to be associated with reassorted strains. Reassortment among avian and human type A influenza viruses produced novel H2N2 and H3N2 strains that caused global human pandemics in 1957 and 1968 respectively [
<xref ref-type="bibr" rid="B10">10</xref>
,
<xref ref-type="bibr" rid="B11">11</xref>
]. The type A H1N1 swine-originating influenza virus associated with the 2009 pandemic was produced by a reassortant between a Eurasian swine virus and a triple reassortant North American swine virus of avian, human and swine origin [
<xref ref-type="bibr" rid="B12">12</xref>
]. Collectively, these pandemics have been associated with tens of million deaths worldwide. The rapid identification of reassorted strains of the virus is therefore an important requirement to mitigate the impact of influenza pandemics.</p>
<p>The most conventional method to identify reassorted influenza viruses involves the construction of phylogenetic trees based on the alignment of gene sequences for each viral protein [
<xref ref-type="bibr" rid="B13">13</xref>
]. Genes are first sequenced using the reverse transcriptase polymerase chain reaction (RT-PCR) [
<xref ref-type="bibr" rid="B14">14</xref>
]. Multiple sequence alignments for each gene segment are then performed using algorithms such as ClustalW [
<xref ref-type="bibr" rid="B15">15</xref>
]. Phylogenetic trees are then constructed based upon these alignments. Where different gene segments of a common strain are in conflicting position across the trees, a potential reassorted virus is identified.</p>
<p>Given that full gene sequencing [
<xref ref-type="bibr" rid="B16">16</xref>
] of a large number of strains is very time consuming, even with the advent of real time parallel PCR sequencing methods [
<xref ref-type="bibr" rid="B17">17</xref>
,
<xref ref-type="bibr" rid="B18">18</xref>
] and that multiple sequence alignments of full gene sequences are both computationally and time intensive, this approach has its limitations. As tree construction for all eight gene segments of the viral RNA is subsequently needed to establish a potential reassorted strain [
<xref ref-type="bibr" rid="B19">19</xref>
] algorithms have been developed to automate this process [
<xref ref-type="bibr" rid="B20">20</xref>
]. Rabadan and co-workers measured the Hamming distance between respective gene segments to establish the presence or absence of reassortment [
<xref ref-type="bibr" rid="B21">21</xref>
], while Nagarajan and Kingsford [
<xref ref-type="bibr" rid="B22">22</xref>
] considered distributions of phylogenetic trees for each gene segment rather than a single consensus tree. Others have pursued reassortment identification based on distance measurement using a complete composition vector (CCV) and segment clustering using a minimum spanning tree (MST) algorithm [
<xref ref-type="bibr" rid="B23">23</xref>
]. Considerations of only a quartet of trees at a time [
<xref ref-type="bibr" rid="B24">24</xref>
] and the use of reassortment networks [
<xref ref-type="bibr" rid="B25">25</xref>
] have also been employed to identify reassorted influenza viruses.</p>
<p>There remain advantages to studying protein over gene sequences for monitoring influenza strains and establishing reassortment [
<xref ref-type="bibr" rid="B26">26</xref>
] due to the degeneracy of the genetic code. Changes to the nucleotide bases at the third codon position provide little or no evolutionary information. Proteins provide a stronger phylogenetic signal associated with 20 possible amino acids at each sequence position versus just 4 nucleotides in the case of gene sequences [
<xref ref-type="bibr" rid="B26">26</xref>
]. The analysis of viral proteins by mass spectrometry is also more rapid and direct than the steps required to both amplify and sequence viral RNA by RT-PCR.</p>
<p>We have recently developed a new rapid and direct proteotyping approach with which to characterize the influenza virus [
<xref ref-type="bibr" rid="B27">27</xref>
-
<xref ref-type="bibr" rid="B30">30</xref>
]. Briefly, whole virus proteolytic digests are analysed by high resolution mass spectrometry to detect signature peptides that are conserved in sequence and unique in mass. These enable the type, subtype and lineage [
<xref ref-type="bibr" rid="B31">31</xref>
] of strains to be unambiguously identified without sequencing of the viral proteins, either in full or in part. A computer algorithm has been written to achieve this in an automated manner [
<xref ref-type="bibr" rid="B32">32</xref>
]. The approach can differentiate seasonal and pandemic H1N1 influenza viruses [
<xref ref-type="bibr" rid="B33">33</xref>
], identify the gene origin of reassorted strains [
<xref ref-type="bibr" rid="B34">34</xref>
] and has been used to study the evolution of H5N1 viruses [
<xref ref-type="bibr" rid="B35">35</xref>
]. The use of mass spectrometry in the proteotyping approach allows for the analysis of hundreds of virus digests at a rate of less than one minute per sample, even without human intervention on some automated instruments. The approach is limited only the time required for whole virus or protein digestion.</p>
<p>Here two new algorithms, known as FluShuffle and FluResort, are described which have been specifically written to identify reassortant influenza viruses from such data sets. FluShuffle considers different combinations of viral protein identities that match the mass spectral data using a Gibbs sampling algorithm. FluResort maps those identities onto phylogenetic trees, constructed through viral protein sequence alignments, to calculate the weighted distance of each across two or more different trees. Each weighted mean distance value is normalized by conversion to a Z-score that is used to establish the probability of a reassorted strain.</p>
</sec>
<sec>
<title>Implementation</title>
<sec>
<title>Software design and development</title>
<p>The overall computational approach is shown in Figure
<xref ref-type="fig" rid="F1">1</xref>
where the FluShuffle and FluResort algorithms are highlighted. Some auxiliary programs that were written for data manipulation prior to analysis are also shown. All programs were written in ANSI/ISO standard C++ and tested on Pentium4 and Intel i5 personal computers, with between 1–4 GB of RAM, running either the Microsoft Windows 7 or Kubuntu Linux 11.04 operating system. The FluShuffle and FluResort algorithms have been implemented to run via a web interface.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>An overview of the computational strategy and algorithms used to establish viral protein identity and reassorted strains.</bold>
The algorithms FluShuffle and FluResort are shaded.</p>
</caption>
<graphic xlink:href="1471-2105-13-208-1"></graphic>
</fig>
<sec>
<title>Theoretical peptide library preparation with PepGen</title>
<p>Viral protein sequence data derived from the NCBI Influenza Virus Resource [
<xref ref-type="bibr" rid="B36">36</xref>
], and those sequences representing common contaminant proteins in egg and cell culture grown viruses from the UniProt database were obtained in FASTA format. An algorithm, PepGen was developed to generate theoretical peptide monoisotopic masses with the protein accessions for each protein in the database. To achieve this, PepGen performs an
<italic>in silico</italic>
proteolytic digest of all non-redundant complete sequences was performed based on the specificity of trypsin or Glu-C endoproteinases (D and E cleavage). Autolysis products for these enzymes were also included. Peptides resulting from N-terminal post-translational cleavage were included in this dataset while all peptides with unknown residues were discarded. The theoretical monoisotopic mass was calculated for each protonated peptide ion [M + H]
<sup>+</sup>
with and without methionine oxidation, N-terminal pyroglutamate formation and cysteine carbamidomethylation.</p>
</sec>
<sec>
<title>Viral protein identification with FluShuffle</title>
<p>The assignment of peaks in a mass spectrum consisting of a mixture of viral proteins is not trivial. A simple naive approach, where the distribution for each protein is estimated separately, will fail to account for the possibility that the peaks may originate from other proteins. This leads to incorrect assignments [
<xref ref-type="bibr" rid="B37">37</xref>
]. FluShuffle implements a Bayesian Markov Chain Monte Carlo (MCMC) approach [
<xref ref-type="bibr" rid="B38">38</xref>
] to assign a combination of protein accessions (one per viral protein) to a single mass spectrum.</p>
<p>The posterior probability for any given combination of proteins,
<italic>θ</italic>
, is the probability that the combination is present in the sample given the mass spectral data
<italic>D</italic>
. The expression for the posterior probability in equation 1 is a modified version of the posterior function presented in the ProFound algorithm [
<xref ref-type="bibr" rid="B39">39</xref>
].</p>
<p>
<disp-formula id="bmcM1">
<label>(1)</label>
<mml:math id="M1" name="1471-2105-13-208-i1" overflow="scroll">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>θ</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo></mml:mo>
<mml:mi>P</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mi>θ</mml:mi>
</mml:mfenced>
<mml:mfrac>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo></mml:mo>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>!</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
<mml:mo>!</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
<mml:mfenced open="(" close=")">
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>r</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mi>Δ</mml:mi>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msub>
<mml:mi>σ</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mstyle>
</mml:mfenced>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>
<italic>r</italic>
is the number of peaks in the spectrum that were matched to the set of theoretical peptides from all accessions in
<italic>θ</italic>
.
<italic>w</italic>
is the number of unmatched peaks.
<italic>N</italic>
Is the total number of theoretical peptides produced from digestion of all proteins in
<italic>θ</italic>
. α
<sub>
<italic>i</italic>
</sub>
is the number of theoretical peptides matching to peak
<italic>i</italic>
.
<italic>σ</italic>
<sub>
<italic>i</italic>
</sub>
is the maximum mass error for peak
<italic>i</italic>
in Daltons.
<italic>ΔM</italic>
is the mass acquisition range. The posterior probability
<inline-formula>
<mml:math id="M2" name="1471-2105-13-208-i2" overflow="scroll">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mi>θ</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>D</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
</inline-formula>
can then described as a product of the prior (
<italic>P</italic>
(
<italic>θ</italic>
)) and likelihood functions according to Bayes’ theorem.</p>
<p>Note from equation 1, that as r increases (N-r)! will decrease. However, this is offset by the probability of a random peak. For two possible accessions, A and B, if A has a one extra peptide which matches to an observed peak, r increases by 1 and the value of (N-r)! decreases. However, an increase in r results in one more product to multiply within the ‘
<inline-formula>
<mml:math id="M3" name="1471-2105-13-208-i3" overflow="scroll">
<mml:mrow>
<mml:mi mathvariant="italic">aΔM</mml:mi>
<mml:mo>/</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msub>
<mml:mi>σ</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
</inline-formula>
’ term. For high resolution, high mass accuracy mass spectrometry, the term
<italic>σ</italic>
<sub>
<italic>i</italic>
</sub>
is very small. This results in a large ‘
<inline-formula>
<mml:math id="M4" name="1471-2105-13-208-i4" overflow="scroll">
<mml:mrow>
<mml:mi mathvariant="italic">aΔM</mml:mi>
<mml:mo>/</mml:mo>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msub>
<mml:mi>σ</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
</inline-formula>
’ term that will more than offset the decrease in the posterior probability resulting from a decrease in (N-r).</p>
<p>Larger values of the posterior probability represent an improved fit to the observed data. Increasing the number of matched peaks and decreasing the number of unmatched peaks will increase the posterior probability. The prior probability enables information about the strain to be included in the identification process. For example, the expected similarity between influenza strains of consecutive seasons can be used to define priors such that viral protein accessions from the previous season have a higher prior probability. The default priors are the historical frequencies of each accession in the database.</p>
<p>FluShuffle uses a Gibbs sampling algorithm to estimate the marginal posterior probability for each known accession. The marginal posterior probability simply represents the probability of that accession being present in the sample given the data. A higher probability indicates that there is more evidence for the presence of that particular accession. The Gibbs sampler is chosen as it can handle many parameters (i.e. proteins) simultaneously.</p>
<p>The Gibbs sampler algorithm generates a new combination of accessions at each iteration step. Let the combination of accessions be described by {s
<sub>1</sub>
, s
<sub>2</sub>
, … s
<sub>n</sub>
} where s
<sub>i</sub>
is the current accession for protein
<italic>i</italic>
. At a new iteration, each accession in the database for protein 1 is combined with {s
<sub>2</sub>
, … s
<sub>n</sub>
} to generate a combination that is supplied to equation 1. This produces a conditional probability (as it is based on given values for the other proteins) for each accession of protein 1. A new accession is then randomly chosen with probability of selection equal to the conditional posterior probability for each accession. This new accession is used to replace
<italic>S</italic>
<sub>1</sub>
in the combination. This process is repeated for protein 2 except that each accession for protein 2 is combined with the updated accession for protein 1 as well as before calculation of the conditional posterior
<italic>.</italic>
More generally, the conditional posterior is calculated for each accession of protein
<italic>i</italic>
based on the updated values for proteins 1,2,…,
<italic>i</italic>
-1 and the existing values for proteins
<italic>i</italic>
 + 1,
<italic>i</italic>
 + 2, …n. The iteration is complete when the accession for each protein is updated.</p>
<p>FluShuffle repeats the process of step generation for a user-defined number of iterations (default is 5000) to produce a “chain” of steps. The Gibbs sampler is designed such that steps to high probability solutions are more likely to be generated than steps to low probability solutions. To avoid a biased posterior estimate, the first 10% of steps were discarded as burn-in. The MCMC algorithm nominates a starting combination of viral proteins (randomly or otherwise) and traverses the solution landscape by switching proteins to identify higher posterior probability solutions. In theory, if the algorithm were to run indefinitely, it doesn't matter if the starting combination is a low probability solution. In practice, due to run time limitations and computing power, there is some bias afforded by the starting combination that may never be visited during the rest of the run. To remove this bias, the first 10% of the steps of the run were discarded.</p>
<p>The remaining steps were used to estimate the posterior probability for each accession based on the proportion of steps that contained that accession. Accessions that match more peaks or match uniquely to a peak will be selected more often at each step in the Gibbs sampler as they have a higher conditional posterior probability. This means that they will be present in more steps and will have a higher posterior probability estimate.</p>
</sec>
<sec>
<title>Determination of virus reassortment with FluResort</title>
<p>Once the identity of the proteins arising from the virus digest has been established by FluShuffle, these identities are used to establish whether the virus is a reassorted strain. To facilitate this, the FluResort algorithm was developed. To establish a statistical model for the likelihood that a virus has been reassorted, the phylogenetic relationship between the strain of each identified protein and all other strains must be determined. To this end, the patristic distance derived from the phylogenetic tree of each viral protein was used.</p>
<p>For each accession
<italic>i</italic>
, FluResort calculates the patristic distance
<italic>d</italic>
to an observed protein accession from FluShuffle
<italic>j</italic>
using the sum of branch lengths from
<italic>i</italic>
to
<italic>j</italic>
in the phylogenetic tree. The distance is then weighted by the posterior probability of
<italic>p</italic>
<sub>
<italic>j</italic>
</sub>
in order to account for uncertainty from FluShuffle identification. Since FluShuffle will typically identify multiple candidate accessions with varying posterior probability, the weighted mean distance (i.e. the disparity between the proposed strain identity and the observed accessions),
<italic>x</italic>
<sub>
<italic>i</italic>
</sub>
is expressed as,</p>
<p>
<disp-formula id="bmcM2">
<label>(2)</label>
<mml:math id="M5" name="1471-2105-13-208-i5" overflow="scroll">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mi>l</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo></mml:mo>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mi>p</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>z</mml:mi>
<mml:mi>e</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mi>K</mml:mi>
</mml:mfenced>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>such that
<italic>K</italic>
is the set of all accessions identified by FluShuffle.</p>
<p>The variance of the weighted distances,
<inline-formula>
<mml:math id="M6" name="1471-2105-13-208-i6" overflow="scroll">
<mml:msubsup>
<mml:mi>σ</mml:mi>
<mml:mi>i</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:math>
</inline-formula>
<sub>,</sub>
is estimated using equation 3. This represents the uncertainty of disparity between each accession and the observed accessions. Uncertainty results in a large spread of distances as the observed accessions are spread throughout the tree. This results in a large variance.</p>
<p>
<disp-formula id="bmcM3">
<label>(3)</label>
<mml:math id="M7" name="1471-2105-13-208-i7" overflow="scroll">
<mml:mrow>
<mml:msubsup>
<mml:mi>σ</mml:mi>
<mml:mi>i</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo></mml:mo>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>j</mml:mi>
<mml:msup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msub>
<mml:mi>d</mml:mi>
<mml:mi mathvariant="italic">ij</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>Each weighted mean distance value calculated by the FluResort algorithm was converted to a Z-score (equation 5). The accession with the lowest weighted mean distance,
<inline-formula>
<mml:math id="M8" name="1471-2105-13-208-i8" overflow="scroll">
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:math>
</inline-formula>
, was determined for each protein. The variance of the weighted distances,
<inline-formula>
<mml:math id="M9" name="1471-2105-13-208-i9" overflow="scroll">
<mml:msubsup>
<mml:mi>σ</mml:mi>
<mml:mn>0</mml:mn>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:math>
</inline-formula>
, was calculated for that accession using equation 3. This allows calculation of the Z-score,
<italic>Z</italic>
<sub>
<italic>i</italic>
</sub>
, for the accession
<italic>i</italic>
corresponding to each strain based on its weighted mean distance,
<inline-formula>
<mml:math id="M10" name="1471-2105-13-208-i10" overflow="scroll">
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:math>
</inline-formula>
.</p>
<p>
<disp-formula id="bmcM4">
<label>(4)</label>
<mml:math id="M11" name="1471-2105-13-208-i11" overflow="scroll">
<mml:mrow>
<mml:msub>
<mml:mi>Z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
<mml:msub>
<mml:mi>σ</mml:mi>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>The Z-score represents the fit of the proposed strain to the observed accessions. A higher Z-score corresponds to a poorer fit i.e. more evidence against the proposed strain. The difference between the protein accession and the lowest weighted mean distance,
<inline-formula>
<mml:math id="M12" name="1471-2105-13-208-i12" overflow="scroll">
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:math>
</inline-formula>
, is used as the numerator to guarantee that the lowest Z-score for each protein is zero. The variance is used as the denominator to account for uncertainty in protein identification. Confidence in the proposed strain is reduced as the uncertainty in protein identification increases. This is reflected in the formulation of the Z-score whereby larger variances will result in lower Z-scores.</p>
<p>The Z-scores for all accessions corresponding to a strain are then summed to provide the composite Z-score,
<italic>c</italic>
, for that strain. The “best-fitting” strain is that with the lowest composite Z-score. This can be repeated for combinations of strains where each strain contributes a complementary subset of proteins. A combination of two strains represents a reassorted virus from one reassortment event, a combination of three strains represents a reassorted virus from two reassortment events and so on. Minimum composite Z-scores were compared across differing numbers of reassortment events to determine whether or not the virus was reassorted. The decrease in composite Z-scores with increasing reassortment number must be large enough to justify an increase in the number of parameters. This was assessed using the standard deviation of the composite Z-scores (equation 5).</p>
<p>
<disp-formula id="bmcM5">
<label>(5)</label>
<mml:math id="M13" name="1471-2105-13-208-i13" overflow="scroll">
<mml:mrow>
<mml:msub>
<mml:mi>σ</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msqrt>
<mml:mi>N</mml:mi>
</mml:msqrt>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>The variance of the composite Z-score is equal to the sum of variances for each Z-score contributing to the composite. As each variance is normalized to 1, the sum of variances is equal to the number of proteins,
<italic>N</italic>
, being examined assuming that the identity of each viral protein is independent of one another. The standard deviation of the composite,
<italic>σ</italic>
<sub>
<italic>c</italic>
</sub>
, can then be calculated as a function of
<italic>N</italic>
. A large difference in composite Z-scores was considered to be equal to or greater than
<inline-formula>
<mml:math id="M14" name="1471-2105-13-208-i14" overflow="scroll">
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:msqrt>
<mml:mi>N</mml:mi>
</mml:msqrt>
</mml:mrow>
</mml:math>
</inline-formula>
(i.e. two standard deviations). This corresponds approximately to a 0.05 significance level for a two-tailed Z-test if the weighted distances are assumed to be normally distributed given the Z-score has an equivalent formulation to the Z-statistic.</p>
</sec>
</sec>
</sec>
<sec>
<title>Results and discussion</title>
<sec>
<title>Application of FluShuffle and FluResort algorithms to analyze MS data of reassorted pandemic strain</title>
<p>The FluShuffle and FluResort algorithms were first tested with mass spectral data obtained from the digestion of a type A H1N1 strain produced from the reassortment of a 2009 H1N1 pandemic strain (A/California/07/2009) and a lab-modified H1N1 strain (A/Puerto Rico/08/1934). It was produced for a vaccine (PanVax 2009) against the 2009 H1N1 pandemic swine-originating influenza virus (SOIV) strains and retains the surface viral proteins, hemagglutinin and neuraminidase, of the pandemic strain to elicit an immune response against the native strain.</p>
<p>The FluShuffle algorithm was first used to perform a combined analysis on the high resolution MALDI mass spectra obtained from the respective whole virus digests of PanVax using trypsin and Glu-C endoproteinases (Figure
<xref ref-type="fig" rid="F2">2</xref>
). Monoisotopic mass values for 46 protonated peptide ions were identified in the mass spectrum resulting from the tryptic digest after signal-to-noise filtering and deisotoping (Figure
<xref ref-type="fig" rid="F2">2</xref>
a). Of these, 6 peptide ions were each matched to the hemagglutinin (HA) and matrix M1 proteins (Table
<xref ref-type="table" rid="T1">1</xref>
) by FluShuffle. The nucleoprotein (NP) was matched to 13 ions whereas the neuraminidase (NA) protein matched 2 peptide ions. A further 2 peptide ions were matched to peptides from both M1 and NP that could not be distinguished due to their similar theoretical mass values.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>High resolution MALDI mass spectrum of the (a) tryptic and (b) Glu-C endoproteinase whole virus digest of the PanVax vaccine against the 2009 H1N1 influenza pandemic strains.</bold>
Peaks labelled Glu-C denote autolysis products.</p>
</caption>
<graphic xlink:href="1471-2105-13-208-2"></graphic>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption>
<p>Posterior probabilities for the major viral proteins detected in the mass spectra of the whole virus digest mass spectra of the PanVax vaccine strain</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Viral antigen</bold>
</th>
<th align="center">
<bold>Strain</bold>
</th>
<th align="center">
<bold>Number of matched tryptic peptides</bold>
</th>
<th align="center">
<bold>Assignment confidence (%) trypsin digest only</bold>
</th>
<th align="center">
<bold>Number of matched Glu-C peptides</bold>
</th>
<th align="center">
<bold>Assignment confidence (%) Glu-C digest only</bold>
</th>
<th align="center">
<bold>Assignment confidence (%) combined digest</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">HA
<hr></hr>
</td>
<td align="center" valign="bottom">A/California/07/2009
<hr></hr>
</td>
<td align="center" valign="bottom">6
<hr></hr>
</td>
<td align="center" valign="bottom">100
<hr></hr>
</td>
<td align="center" valign="bottom">9
<hr></hr>
</td>
<td align="center" valign="bottom">100
<hr></hr>
</td>
<td align="center" valign="bottom">100
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA
<hr></hr>
</td>
<td align="center" valign="bottom">A/California/07/2009
<hr></hr>
</td>
<td align="center" valign="bottom">2
<hr></hr>
</td>
<td align="center" valign="bottom">46
<hr></hr>
</td>
<td align="center" valign="bottom">3
<hr></hr>
</td>
<td align="center" valign="bottom">100
<hr></hr>
</td>
<td align="center" valign="bottom">100
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NP
<hr></hr>
</td>
<td align="center" valign="bottom">A/Puerto Rico/08/1934
<hr></hr>
</td>
<td align="center" valign="bottom">13
<hr></hr>
</td>
<td align="center" valign="bottom">100
<hr></hr>
</td>
<td align="center" valign="bottom">7
<hr></hr>
</td>
<td align="center" valign="bottom">30
<hr></hr>
</td>
<td align="center" valign="bottom">100
<hr></hr>
</td>
</tr>
<tr>
<td align="left">M1</td>
<td align="center">A/Puerto Rico/08/1934</td>
<td align="center">6</td>
<td align="center">47</td>
<td align="center">3</td>
<td align="center">1</td>
<td align="center">66</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Values are shown in percentages were estimated with FluShuffle and summed over the clade.</p>
</table-wrap-foot>
</table-wrap>
<p>Monoisotopic mass values for 77 ion peaks were identified in the mass spectrum resulting from whole virus digestion with endoproteinase Glu-C (Figure
<xref ref-type="fig" rid="F2">2</xref>
b). Most of the high intensity ions in the mass spectrum were identified as being derived from viral proteins. Segments of the HA and NP proteins were matched to 9 and 7 ions respectively (Table
<xref ref-type="table" rid="T1">1</xref>
). The NA and M1 proteins were matched to 3 ions each.</p>
<p>Other viral proteins matched to fewer than 2 mass values in each spectrum. This resulted in low confidences for each predicted identity. The low number of matches is associated with the low copy numbers of the polymerase subunits, non-structural (NS) proteins and M2 protein within each virion. Smaller proteins like M2 and NS2 are also less likely to be detected, as they contain fewer tryptic peptides for matching.</p>
<p>The predicted protein identities obtained from the mass spectrum by FluShuffle were visualised by edge colouration in the mid-point rooted phylogenetic tree for each protein. The HA protein was clearly sub typed as H1 (Figure
<xref ref-type="fig" rid="F3">3</xref>
) with the origin further localised to the clade containing the A/California/07/2009 strain and its close relatives with 100% certainty. This is consistent with its identity origin in the PanVax strain. The same result was observed for the neuraminidase protein (see Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figure S1).</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>
<bold>Phylogenetic tree for the hemagglutinin protein (H1 subtype) with colouration of its predicted identity within the PanVax strain.</bold>
Irrelevant clades have been collapsed for clarity. A scale bar is shown that represents distance as substitutions per site. The location of the expected strain origin (A/California/07/2009) is labelled and the sum of probabilities for its clade of close relatives is shown in brackets as a percentage. The location of the A/Puerto Rico/08/1934 strain is also labelled.</p>
</caption>
<graphic xlink:href="1471-2105-13-208-3"></graphic>
</fig>
<p>The nucleoprotein was clearly identified as originating from a type A strain (Figure
<xref ref-type="fig" rid="F4">4</xref>
). Its predicted identity was localised to a clade containing close relatives of the A/Puerto Rico/08/1934 strain with 100% certainty consistent with the origin of the protein in the PanVax strain. A subclade containing close relatives of the expected A/Puerto Rico/08/1934 strain was identified with only 67% probability. The matrix M1 protein was also identified as type A (Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Figure S2). However, further localisation was achieved with less confidence. Uncertainty in its identification is due in part to the high sequence conservation of the M1 protein [
<xref ref-type="bibr" rid="B29">29</xref>
]. In addition, a lower number of mass values were matched to the M1 compared to the NP.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption>
<p>
<bold>Phylogenetic tree for the nucleoprotein for influenza type A with colouration of its predicted identity within the PanVax strain.</bold>
The location of the expected origin (A/Puerto Rico/08/1934) is labelled and the sum of probabilities for its clade of close relatives is shown in brackets as a percentage. The location of the A/California/07/2009 strain is also labelled.</p>
</caption>
<graphic xlink:href="1471-2105-13-208-4"></graphic>
</fig>
<p>The analysis was repeated using differing combinations of the data and the probabilities of the expected identities were determined as shown in Table
<xref ref-type="table" rid="T1">1</xref>
. The posterior probabilities, shown as percentages, were estimated with FluShuffle and summed over the clade.</p>
<p>The FluResort algorithm was used to determine the possibility of reassortment based on predicted protein identities. For simplicity, the analysis was limited to those that were confidently identified (i.e. proteins HA, NA, NP and M1). All vaccine strains were banned to avoid a trivial match to the PanVax strain itself. The threshold value for reassortment of 4 proteins is 2√4 = 4 composite Z-score units (see equation 5). The composite Z-score without reassortment was 39 units greater than that of the fully reassorted combination (Table
<xref ref-type="table" rid="T2">2</xref>
). This is far greater than the threshold value, which indicates that the non-reassorted combination is a poor fit to the predicted identities. In contrast, no major differences were observed between the composite Z-scores for the fully reassorted combination and those for combinations with 1 or more reassortment events. This favours a single reassortment event, consistent with the expected nature of the PanVax strain.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption>
<p>Reassortment events in the predicted strain origins for the viral proteins of the PanVax vaccine strain</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Reassortment events</bold>
</th>
<th align="center">
<bold>Viral antigen</bold>
</th>
<th align="center">
<bold>Strain</bold>
</th>
<th align="center">
<bold>Z-score</bold>
</th>
<th align="center">
<bold>Composite</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">0
<hr></hr>
</td>
<td align="center" valign="bottom">HA
<hr></hr>
</td>
<td align="center" valign="bottom">A/Puerto Rico/08/1934 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">21.2
<hr></hr>
</td>
<td rowspan="4" align="center" valign="top">39.6
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">M1
<hr></hr>
</td>
<td align="center" valign="bottom">A/Puerto Rico/08/1934 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">NA
<hr></hr>
</td>
<td align="center" valign="bottom">A/Puerto Rico/08/1934 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">18.4
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">NP
<hr></hr>
</td>
<td align="center" valign="bottom">A/Puerto Rico/08/1934 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">1
<hr></hr>
</td>
<td align="center" valign="bottom">HA
<hr></hr>
</td>
<td align="center" valign="bottom">A/California/07/2009 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
<td rowspan="4" align="center" valign="top">0.3
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">M1
<hr></hr>
</td>
<td align="center" valign="bottom">A/Puerto Rico/08/1934 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">NA
<hr></hr>
</td>
<td align="center" valign="bottom">A/California/07/2009 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0.3
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">NP
<hr></hr>
</td>
<td align="center" valign="bottom">A/Puerto Rico/08/1934 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">2
<hr></hr>
</td>
<td align="center" valign="bottom">HA
<hr></hr>
</td>
<td align="center" valign="bottom">A/California/07/2009 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
<td rowspan="4" align="center" valign="top">0.0</td>
</tr>
<tr>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">M1
<hr></hr>
</td>
<td align="center" valign="bottom">A/Puerto Rico/08/1934 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">NA
<hr></hr>
</td>
<td align="center" valign="bottom">A/Louisiana/05/2009 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
</tr>
<tr>
<td align="center"> </td>
<td align="center">NP</td>
<td align="center">A/Puerto Rico/08/1934 (H1N1)</td>
<td align="center">0</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The lowest composite Z-score and the corresponding combination of strains is shown for each number of reassortment events.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>Analysis of seasonal type A strain</title>
<p>A type A strain representative of seasonal H1N1 strains responsible for annual epidemics in human populations during the period 2006–2008 period was analyzed.</p>
<p>The FluShuffle algorithm was used to analyze the high resolution mass spectrum of the tryptic whole virus digest of the A/Solomon Islands/03/2006 strain (Figure
<xref ref-type="fig" rid="F5">5</xref>
). Monoisotopic mass values for 14 ion peaks were identified from the mass spectrum after filtering and deisotoping. Most high intensity ions were matched by FluShuffle to theoretical peptide masses from the proteolysis of the major viral proteins. Segments of the HA and M1 proteins were matched to 3 ions whereas peptides derived from the NP matched to 5 ions (Table
<xref ref-type="table" rid="T3">3</xref>
).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption>
<p>
<bold>High resolution MALDI mass spectrum of the tryptic whole virus digest of the A/Solomon Islands/03/2006 strain.</bold>
Peaks labelled trypsin denote autolysis products.</p>
</caption>
<graphic xlink:href="1471-2105-13-208-5"></graphic>
</fig>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption>
<p>Reassortment events in the predicted identities for the seasonal type A/Solomon Islands/03/2006 strain</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Reassortment events</bold>
</th>
<th align="center">
<bold>Viral antigen</bold>
</th>
<th align="center">
<bold>Number of matched tryptic peptides</bold>
</th>
<th align="center">
<bold>Strain</bold>
</th>
<th align="center">
<bold>Z-score</bold>
</th>
<th align="center">
<bold>Composite</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">0
<hr></hr>
</td>
<td align="center" valign="bottom">HA
<hr></hr>
</td>
<td align="center" valign="bottom">3
<hr></hr>
</td>
<td align="center" valign="bottom">A/Albany/4835/1948 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">1.5
<hr></hr>
</td>
<td rowspan="3" align="center" valign="top">2.6
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">M1
<hr></hr>
</td>
<td align="center" valign="bottom">3
<hr></hr>
</td>
<td align="center" valign="bottom">A/Albany/4835/1948 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0.2
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">NP
<hr></hr>
</td>
<td align="center" valign="bottom">5
<hr></hr>
</td>
<td align="center" valign="bottom">A/Albany/4835/1948 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0.9
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">1
<hr></hr>
</td>
<td align="center" valign="bottom">HA
<hr></hr>
</td>
<td align="center" valign="bottom">3
<hr></hr>
</td>
<td align="center" valign="bottom">A/England/494/2006 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
<td rowspan="3" align="center" valign="top">0.6
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">M1
<hr></hr>
</td>
<td align="center" valign="bottom">3
<hr></hr>
</td>
<td align="center" valign="bottom">A/Hemsbury/1948 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0.2
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">NP
<hr></hr>
</td>
<td align="center" valign="bottom">5
<hr></hr>
</td>
<td align="center" valign="bottom">A/Hemsbury/1948 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0.4
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">2
<hr></hr>
</td>
<td align="center" valign="bottom">HA
<hr></hr>
</td>
<td align="center" valign="bottom">3
<hr></hr>
</td>
<td align="center" valign="bottom">A/England/494/2006 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
<td rowspan="3" align="center" valign="top">0.0</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">M1
<hr></hr>
</td>
<td align="center" valign="bottom">3
<hr></hr>
</td>
<td align="center" valign="bottom">A/mallard/Ohio/66/1999 (H1N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
</tr>
<tr>
<td align="left"> </td>
<td align="center">NP</td>
<td align="center">5</td>
<td align="center">A/New Jersey/1976 (H1N1)</td>
<td align="center">0</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The lowest composite Z-score and the corresponding combination of strains is shown for each number of reassortment events.</p>
</table-wrap-foot>
</table-wrap>
<p>The HA protein was identified to a clade containing seasonal H1N1 strains from 2000 to 2009 with 100% confidence (Figure
<xref ref-type="fig" rid="F6">6</xref>
). This is consistent with the identity of the A/Solomon Islands/03/2006 strain. However, further localisation could not be achieved with high confidence (i.e. over 90%). The subclade containing the close relatives of this expected strain was identified with only 1% confidence. The highest probability subclade contained North American seasonal strains from 2006 to 2007 and was identified with 34% confidence. HA protein sequences from these two subclades share 98% sequence identity. This high sequence conservation and the low number of matching peptides prevented the A/Solomon Islands/03/2006 strain from being identified.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption>
<p>
<bold>Phylogenetic tree for the hemagglutinin protein (H1 subtype) with colouration of its predicted identity within the type A/Solomon Islands/03/2006 strain.</bold>
The location of the expected identity is labelled and the sum of probabilities for its clade of close relatives is shown in brackets as a percentage. The clade of closely related sequences with the greatest sum probability is marked in bold. The clade containing seasonal H1N1 strains is also shown with its sum of probabilities.</p>
</caption>
<graphic xlink:href="1471-2105-13-208-6"></graphic>
</fig>
<p>The nucleoprotein and matrix M1 proteins were identified as type A with 100% certainty. In the case of the nucleoprotein, localisation was achieved with 90% confidence to a clade containing strains of mixed subtype that contained the expected A/Solomon Islands/03/2006 strain. However, a sub-clade containing the close relatives of the matrix M1 protein of the A/Solomon Islands/03/2006 strain from was identified with only 10% certainty. Once again, different strains could not be distinguished due to only 3 matched ions for this protein and the high sequence conservation observed across strains for the M1 protein.</p>
<p>The predicted identities were analyzed by the FluResort algorithm to determine whether the A/Solomon Islands/03/2006 strain was reassorted. The threshold of reassortment in 3 proteins is approximately 3.4 whereas the maximum decrease in the composite Z-score observed is 2.6 (Table
<xref ref-type="table" rid="T3">3</xref>
). This identifies that the strain is produced without reassortment.</p>
</sec>
<sec>
<title>Analysis of seasonal type B strain</title>
<p>The B/Florida/07/2004 strain is a type B human influenza strain from the Yamagata 88-like lineage. A closely related strain was in circulation until 2009 and formed the basis of the seasonal influenza vaccine in 2008 to 2009 [
<xref ref-type="bibr" rid="B30">30</xref>
].</p>
<p>The FluShuffle algorithm was used to perform a combined analysis on the high resolution mass spectra recorded for whole virus digests of the B/Florida/07/2004 strain with endoproteinases trypsin and Glu-C. Monoisotopic mass values for 25 and 14 peptide ion peaks were identified in the mass spectrum of the tryptic and Glu-C whole virus digests respectively (data not shown). Segments of the hemagglutinin protein were matched to a combined 7 peptide ion mass values in the combined spectral data whereas the nucleoprotein was matched to only 2 peptide ion mass values in the tryptic digest spectrum only. The matrix M1 protein was matched to 2 peptide ion mass values in this spectrum (Table
<xref ref-type="table" rid="T4">4</xref>
).</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption>
<p>Reassortment events in the predicted identities for the seasonal type B/Florida/07/2004 strain</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Reassortment events</bold>
</th>
<th align="center">
<bold>Viral antigen</bold>
</th>
<th align="center">
<bold>Number of matched proteolytic peptides</bold>
</th>
<th align="center">
<bold>Strain</bold>
</th>
<th align="center">
<bold>Z-score</bold>
</th>
<th align="center">
<bold>Composite</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">0
<hr></hr>
</td>
<td align="center" valign="bottom">HA
<hr></hr>
</td>
<td align="center" valign="bottom">7
<hr></hr>
</td>
<td align="center" valign="bottom">B/Cheongju/437/2008
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
<td rowspan="3" align="center" valign="top">0.76
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">M1
<hr></hr>
</td>
<td align="center" valign="bottom">2
<hr></hr>
</td>
<td align="center" valign="bottom">B/Cheongju/437/2008
<hr></hr>
</td>
<td align="center" valign="bottom">0.74
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">NP
<hr></hr>
</td>
<td align="center" valign="bottom">2
<hr></hr>
</td>
<td align="center" valign="bottom">B/Cheongju/437/2008
<hr></hr>
</td>
<td align="center" valign="bottom">0.02
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">1
<hr></hr>
</td>
<td align="center" valign="bottom">HA
<hr></hr>
</td>
<td align="center" valign="bottom">7
<hr></hr>
</td>
<td align="center" valign="bottom">B/Cheongju/437/2008
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
<td rowspan="3" align="center" valign="top">0.02
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">M1
<hr></hr>
</td>
<td align="center" valign="bottom">2
<hr></hr>
</td>
<td align="center" valign="bottom">A/chicken/Taiwan/0705/99 (H6N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">NP
<hr></hr>
</td>
<td align="center" valign="bottom">2
<hr></hr>
</td>
<td align="center" valign="bottom">B/Cheongju/437/2008
<hr></hr>
</td>
<td align="center" valign="bottom">0.02
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">2
<hr></hr>
</td>
<td align="center" valign="bottom">HA
<hr></hr>
</td>
<td align="center" valign="bottom">7
<hr></hr>
</td>
<td align="center" valign="bottom">B/California/15/2007
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
<td rowspan="3" align="center" valign="top">0.0</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">M1
<hr></hr>
</td>
<td align="center" valign="bottom">2
<hr></hr>
</td>
<td align="center" valign="bottom">A/chicken/Taiwan/0705/99 (H6N1)
<hr></hr>
</td>
<td align="center" valign="bottom">0
<hr></hr>
</td>
</tr>
<tr>
<td align="left"> </td>
<td align="center">NP</td>
<td align="center">2</td>
<td align="center">B/Mie/01/1993</td>
<td align="center">0</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The lowest composite Z-score and the corresponding combination of strains is shown for each number of reassortment events.</p>
</table-wrap-foot>
</table-wrap>
<p>The hemagglutinin protein was identified as originating from the Yamagata 88-like lineage of the influenza type B strains with 100% certainty (Additional file
<xref ref-type="supplementary-material" rid="S3">3</xref>
: Figure S3). The origin of the hemagglutinin protein was further localised to a clade containing close relatives of the B/Florida/07/2004 strain with 100% confidence. The nucleoprotein was identified as originating from a type B strain with 100% certainty, though without further localization, while the matrix M1 protein protein was identified as influenza type B with a confidence of 33% due the detection of few ions associated with it in the tryptic digest spectrum.</p>
<p>Despite the poorer quality of the mass spectral data for this strain, the FluResort algorithm was utilized to determine whether the B/Florida/07/2004 strain was reassorted based on the identities of the HA, NP and M1 proteins. The threshold of reassortment in 3 proteins is again approximately 3.4, whereas the maximum decrease in the composite Z-score in Table
<xref ref-type="table" rid="T4">4</xref>
is 0.76. This small decrease is consistent with a lack of reassortment in the generation of the type B/Florida/07/2004 strain. The predicted non-reassortant strain type B/Cheongjuj/437/2008 has 99% sequence identity to the type B/Florida/07/2004 strain across the HA, M1 and NP viral proteins.</p>
</sec>
<sec>
<title>Testing of FluShuffle and FluResort algorithms with simulated mass spectral data</title>
<p>The performance of the FluShuffle algorithm was evaluated more extensively through the analysis of 500 mass spectral datasets for simulated whole virus digests both in the Markov chain Monte Carlo (MCMC) [
<xref ref-type="bibr" rid="B38">38</xref>
] or single-protein modes. The former resulted in significant decreases in the proportion of misidentified viral subtypes (data not shown). The proportion of correct identifications increased with increasing sequence coverage since increasing the amount of mass spectral data improves the confidence of protein identification. No changes were observed with increased numbers of noise or background ion peaks. This is due to the high resolution and mass accuracy offered by the MALDI FT-ICR instrument in which ions of very similar mass to charge can be easily resolved, and random noise peaks in the spectrum are unlikely to match to those of a viral protein-derived peptide.</p>
</sec>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>Influenza strains</title>
<p>The PanVax H1N1 vaccine was donated by CSL Biotherapies, CSL Limited (Parkville, Victoria, Australia) and used without further purification. The vaccine contains the NYMC X-181 strain at a dose of 30 ng/mL of HA. The genomic segments encoding the HA, NA and PB1 proteins in the vaccine strain are derived from the pandemic H1N1 2009 strain A/California/07/2009. The other segments are derived from NYMC X-157, a lab-modified A/Puerto Rico/08/1934 strain. Accession numbers used for the expected viral protein identities were GenBank: ACP44189 (HA), ACT36688 (NA), ACF41835 (M1) and ABD77679 (NP).</p>
<p>Strains B/Florida/07/2004 and A/Solomon Islands/03/2006 were obtained from Advanced ImmunoChemicals Inc. (Long Beach, CA, USA) as inactivated virus preparations from egg allantoic fluid. Strain B/Florida/07/2004 is a type B virus from the Yamagata 88-like lineage. The accessions used for the expected protein identities were GenBank: ACF54213 (HA), ACF54217 (NA), ACF54214 (M1) and ACF54218 (NP). Strain A/Solomon Islands/03/2006 is a seasonal virus with a H1N1 subtype. Accessions used for the expected viral protein identities were GenBank: ABU99109 (HA), ABU99068 (NA), ACD37437 (M1) and ACX46205 (NP).</p>
</sec>
<sec>
<title>Whole virus digest of influenza strains</title>
<p>A suspension corresponding to 3 μg of whole virus was concentrated to near dryness and resolubilized in 50 μL digestion buffer (50 mM NH
<sub>4</sub>
HCO
<sub>3</sub>
, 10% acetonitrile and 2 mM dithiothreitol, pH 7.8). Sequencing grade porcine trypsin (Promega Corporation, WI, USA) or Glu-C from
<italic>Staphylococcus aureus V8</italic>
(Roche Diagnostics GmbH, Mannheim, Germany) was added at 6–14 ng/μL and the solution was incubated overnight at 37°C and 25°C respectively.</p>
</sec>
<sec>
<title>High resolution MALDI FT-ICR mass spectrometry</title>
<p>Each digest solution was added to five volumes of MALDI matrix (5 mg/mL α-cyano-4-hydroxycinnaminic acid, 50% acetonitrile, 0.1% TFA), spotted onto a MALDI sample plate (MTP AnchorChip™400/384 TF, Bruker Daltonics, Billerica, MA, USA) and air-dried at room temperature. MALDI FT-ICR mass spectra were recorded on a 7T Bruker APEX-Qe mass spectrometer (Bruker Daltonics, Billerica, MA, USA) in the positive ion mode using a 35% laser power as previously described [
<xref ref-type="bibr" rid="B27">27</xref>
]. Raw peak lists were converted to MS1 format [
<xref ref-type="bibr" rid="B40">40</xref>
] and monoisotopic values were identified using Hardklör [
<xref ref-type="bibr" rid="B41">41</xref>
] with a maximum charge state of +1 and no centroiding.</p>
</sec>
<sec>
<title>Viral protein identification from mass spectra</title>
<p>Peptide library files and the annotation library file were produced by PepGen. FluShuffle was used to identify viral proteins from the experimental peak lists of monoisotopic mass values. Analysis of peak lists was performed with consideration for variable methionine oxidation, variable pyroglutamate formation, a maximum of 1 missed cleavage and a mass tolerance of 5 ppm. Matches for both trypsin and endoproteinase Glu-C digests were simultaneously incorporated into the calculation of the posterior probability during Gibbs sampling. The information from multiple digests was integrated into one prediction for the viral protein identity.</p>
<p>Predicted identities were plotted onto phylogenetic trees that were visualised using the Archeopteryx software. Phylogenetic trees were produced with FastTree 2.0 [
<xref ref-type="bibr" rid="B42">42</xref>
] using from non-redundant viral protein sequences from NCBI and aligned with MUSCLE [
<xref ref-type="bibr" rid="B43">43</xref>
]. Trees generated were mid-point rooted to increase the interpretability of branch distances. FluResort was used to establish whether or not the virus was reassorted. Laboratory produced strains were banned from the analysis to avoid the trivial detection of artificially reassorted strains. A minimum difference of 2√N in composite Z-scores was required to define reassortment.</p>
</sec>
<sec>
<title>Simulation of large numbers of datasets to test algorithms with FluSim</title>
<p>Simulated peak lists were constructed for analysis by FluShuffle to evaluate its performance in predicting protein of known identities. FluSim generates random peak lists from the monoisotopic mass value for proteolytic peptide ions generated
<italic>in silico</italic>
from viral proteins and contaminant protein sequences. 500 random peak lists resulting from a simulated tryptic digest of viral protein sequences were generated using FluSim utilizing 5-20% sequence coverage with the addition of 20% spurious noise peaks, 0.1% contaminant coverage, variable methionine oxidation, variable pyroglutamate formation, a maximum of 1 missed cleavage and a mass tolerance of 5 ppm.</p>
<p>One accession was randomly selected for each protein. Each accession was also associated with a set of peptide masses resulting from proteolytic digestion. A subset of the peptide ion masses was then randomly picked for inclusion into the peak list such that the specified sequence coverage was achieved. Noise peaks were then added at random across the mass acquisition range to the peak list. The peptide set for contaminants was also collated and contaminant masses were added according to the specified contaminant coverage. A random mass error was added to each
<italic>m/z</italic>
value within a specified mass tolerance.</p>
<p>FluShuffle was used to establish the expected protein identities as described above. A correct identification is defined as a predicted identity that is less than 0.05 substitutions per site from the expected identity.</p>
</sec>
</sec>
<sec sec-type="conclusions">
<title>Conclusions</title>
<p>The FluShuffle and FluResort algorithms correctly identified the reassorted nature of the PanVax strain and the identity of the viral proteins that comprise it. Both of the seasonal strains studied were found not to be reassorted and the clade containing the strain under investigation was identified with 100% confidence in terms of the hemagglutinin protein in both cases, and with less confidence for other proteins where fewer peptides were detected and high sequence conservation exists among strains. In the case of highly related strains, with similar protein sequences, peptide segments that span regions of sequence difference must be detected in the mass spectrum in order for strains to be differentiated from one another. As peptides spanning the entire sequence are usually not detected, particularly in the mass spectra of multiple viral proteins from whole virus digests, the identification of a single strain or a set of strains within a clade may not be possible.</p>
<p>Nonetheless, the algorithms significantly improve the capability of the proteotyping approach to identify reassorted viruses that pose the greatest pandemic risk. The FluShuffle algorithm extends the capabilities of the proteotyping approach, beyond the determination of viral type, subtype or lineage, by allowing the identification of the strain origin of each protein. FluShuffle can also perform a combined analysis of data from multiple proteolytic digests that outperforms the single protein approach common to protein mass mapping or fingerprinting algorithms [
<xref ref-type="bibr" rid="B39">39</xref>
]. FluResort identifies reassortment more rapidly than other algorithms since it determines whether a single virus has reassorted from existing known strains rather than identifying all the reassortment events in the evolutionary history of a viral strain.</p>
</sec>
<sec>
<title>Availability and requirements</title>
<p>
<bold>Project name:</bold>
FluShuffle and FluResort</p>
<p>
<bold>Project home page:</bold>
<ext-link ext-link-type="uri" xlink:href="http://sydney.edu.au/science/molecular_bioscience/downard/flushuffle.html">http://sydney.edu.au/science/molecular_bioscience/downard/flushuffle.html</ext-link>
</p>
<p>
<bold>Operating system:</bold>
web-based platform independent</p>
<p>
<bold>Programming language:</bold>
C++</p>
<p>
<bold>License:</bold>
Free access for non-commercial use.</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors have no competing interests.</p>
</sec>
<sec>
<title>Authors' contributions</title>
<p>KMD conceived the project, oversaw its design, coordination and progression and drafted the manuscript. ATLL and JWHW contributed to the writing of sections of the manuscript. ATLL designed, developed and tested the algorithms with the advice and participation of JWHW. ATLL prepared the virus digests while KMD carried out the mass spectrometry experiments. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional file 1</title>
<p>
<bold>Figure S1.</bold>
Phylogenetic tree for the neuraminidase protein.</p>
</caption>
<media xlink:href="1471-2105-13-208-S1.tiff" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2">
<caption>
<title>Additional file 2</title>
<p>
<bold>Figure S2.</bold>
Phylogenetic tree for the matrix M1 protein.</p>
</caption>
<media xlink:href="1471-2105-13-208-S2.tiff" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S3">
<caption>
<title>Additional file 3</title>
<p>
<bold>Figure S3.</bold>
Phylogenetic tree for the hemagglutinin protein.</p>
</caption>
<media xlink:href="1471-2105-13-208-S3.tiff" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<sec>
<title>Acknowledgments</title>
<p>This work was supported by an Australian Research Council Discovery Project grant (DP120101167) awarded to KMD and JWHW.</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="book">
<person-group person-group-type="editor">Wilschut JC, McElhaney JE, Palache AM</person-group>
<source>Influenza rapid reference</source>
<year>2006</year>
<edition>2</edition>
<publisher-name>Netherlands: Elsevier</publisher-name>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="book">
<name>
<surname>Van-Tam</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sellwood</surname>
<given-names>C</given-names>
</name>
<source>Introduction to pandemic influenza</source>
<year>2010</year>
<publisher-name>Walingford UK: C.A.B. International</publisher-name>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<name>
<surname>Nelson</surname>
<given-names>MI</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>EC</given-names>
</name>
<article-title>The evolution of epidemic influenza</article-title>
<source>Nature Rev Genetics</source>
<year>2007</year>
<volume>8</volume>
<fpage>196</fpage>
<lpage>205</lpage>
<pub-id pub-id-type="doi">10.1038/nrg2053</pub-id>
<pub-id pub-id-type="pmid">17262054</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<name>
<surname>Zambon</surname>
<given-names>MC</given-names>
</name>
<article-title>The pathogenesis of influenza in humans</article-title>
<source>Rev Med Virol</source>
<year>2001</year>
<volume>11</volume>
<fpage>227</fpage>
<lpage>241</lpage>
<pub-id pub-id-type="doi">10.1002/rmv.319</pub-id>
<pub-id pub-id-type="pmid">11479929</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<name>
<surname>Nguyen-Van-Tam</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Hampson</surname>
<given-names>AW</given-names>
</name>
<article-title>The epidemiology and clinical impact of pandemic influenza</article-title>
<source>Vaccine</source>
<year>2008</year>
<volume>21</volume>
<fpage>1762</fpage>
<lpage>1768</lpage>
<pub-id pub-id-type="pmid">12686091</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Schweiger</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Bruns</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Meixenberger</surname>
<given-names>M</given-names>
</name>
<article-title>Reassortment between human A(H3N2) viruses is an important evolutionary mechanism</article-title>
<source>Vaccine</source>
<year>2006</year>
<volume>24</volume>
<fpage>6683</fpage>
<lpage>6690</lpage>
<pub-id pub-id-type="doi">10.1016/j.vaccine.2006.05.105</pub-id>
<pub-id pub-id-type="pmid">17030498</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<name>
<surname>Nelson</surname>
<given-names>MI</given-names>
</name>
<name>
<surname>Viboud</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Simonsen</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Bennett</surname>
<given-names>RT</given-names>
</name>
<name>
<surname>Griesemer</surname>
<given-names>SB</given-names>
</name>
<name>
<surname>St George</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Spiro</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Sengamalay</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Ghedin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Taubenberger</surname>
<given-names>JK</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>EC</given-names>
</name>
<article-title>Multiple reassortment events in the evolutionary history of H1N1 Influenza A Virus Since 1918</article-title>
<source>PLoS Pathog</source>
<year>2008</year>
<volume>4</volume>
<fpage>e1000012</fpage>
<pub-id pub-id-type="doi">10.1371/journal.ppat.1000012</pub-id>
<pub-id pub-id-type="pmid">18463694</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<name>
<surname>Yassine</surname>
<given-names>HM</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>CW</given-names>
</name>
<name>
<surname>Gourapura</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Saif</surname>
<given-names>YM</given-names>
</name>
<article-title>Interspecies and intraspecies transmission of influenza A viruses: viral, host and environmental factors</article-title>
<source>Anim Health Res Rev</source>
<year>2010</year>
<volume>11</volume>
<fpage>53</fpage>
<lpage>72</lpage>
<pub-id pub-id-type="doi">10.1017/S1466252310000137</pub-id>
<pub-id pub-id-type="pmid">20591213</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<name>
<surname>Kilbourne</surname>
<given-names>ED</given-names>
</name>
<article-title>Influenza pandemics of the 20th century</article-title>
<source>Emerg Infect Dis</source>
<year>2006</year>
<volume>12</volume>
<fpage>9</fpage>
<lpage>14</lpage>
<pub-id pub-id-type="doi">10.3201/eid1201.051254</pub-id>
<pub-id pub-id-type="pmid">16494710</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="journal">
<name>
<surname>Schäfer</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Kawaoka</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Bean</surname>
<given-names>WJ</given-names>
</name>
<name>
<surname>Süss</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Senne</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Webster</surname>
<given-names>RG</given-names>
</name>
<article-title>Origin of the pandemic 1957 H2 influenza A virus and the persistence of its possible progenitors in the avian reservoir</article-title>
<source>Virology</source>
<year>1993</year>
<volume>194</volume>
<fpage>781</fpage>
<lpage>788</lpage>
<pub-id pub-id-type="doi">10.1006/viro.1993.1319</pub-id>
<pub-id pub-id-type="pmid">7684877</pub-id>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<name>
<surname>Fang</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Min Jou</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Huylebroeck</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Devos</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Fiers</surname>
<given-names>W</given-names>
</name>
<article-title>Complete structure of A/duck/Ukraine/63 influenza hemagglutinin gene: animal virus as progenitor of human H3 Hong Kong 1968 influenza hemagglutinin</article-title>
<source>Cell</source>
<year>1981</year>
<volume>25</volume>
<fpage>315</fpage>
<lpage>323</lpage>
<pub-id pub-id-type="doi">10.1016/0092-8674(81)90049-0</pub-id>
<pub-id pub-id-type="pmid">6169439</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<name>
<surname>Smith</surname>
<given-names>GJ</given-names>
</name>
<name>
<surname>Vijaykrishna</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Bahl</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lycett</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Worobey</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Pybus</surname>
<given-names>OG</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>SK</given-names>
</name>
<name>
<surname>Cheung</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Raghwani</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bhatt</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Peiris</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Rambaut</surname>
<given-names>A</given-names>
</name>
<article-title>Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic</article-title>
<source>Nature</source>
<year>2009</year>
<volume>459</volume>
<fpage>1122</fpage>
<lpage>1125</lpage>
<pub-id pub-id-type="doi">10.1038/nature08182</pub-id>
<pub-id pub-id-type="pmid">19516283</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<name>
<surname>Webster</surname>
<given-names>RG</given-names>
</name>
<name>
<surname>Bean</surname>
<given-names>WJ</given-names>
</name>
<name>
<surname>Gorman</surname>
<given-names>OT</given-names>
</name>
<name>
<surname>Chambers</surname>
<given-names>TM</given-names>
</name>
<name>
<surname>Kawaoka</surname>
<given-names>Y</given-names>
</name>
<article-title>Evolution and ecology of influenza A viruses</article-title>
<source>Microbiol Rev</source>
<year>1992</year>
<volume>56</volume>
<fpage>152</fpage>
<lpage>179</lpage>
<pub-id pub-id-type="pmid">1579108</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Taubenberger</surname>
<given-names>JK</given-names>
</name>
<article-title>Methods for molecular surveillance of influenza</article-title>
<source>Expert Rev Anti Infect Ther</source>
<year>2010</year>
<volume>8</volume>
<fpage>517</fpage>
<lpage>527</lpage>
<pub-id pub-id-type="doi">10.1586/eri.10.24</pub-id>
<pub-id pub-id-type="pmid">20455681</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="journal">
<name>
<surname>Thompson</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Higgins</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>CLUSTAL</surname>
<given-names>W</given-names>
</name>
<article-title>Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</article-title>
<source>Nucleic Acids Res</source>
<year>1994</year>
<volume>22</volume>
<fpage>4673</fpage>
<lpage>4680</lpage>
<pub-id pub-id-type="doi">10.1093/nar/22.22.4673</pub-id>
<pub-id pub-id-type="pmid">7984417</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<name>
<surname>Djikeng</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Spiro</surname>
<given-names>D</given-names>
</name>
<article-title>Advancing full length genome sequencing for human RNA viral pathogens</article-title>
<source>Future Virol</source>
<year>2009</year>
<volume>4</volume>
<fpage>47</fpage>
<lpage>53</lpage>
<pub-id pub-id-type="doi">10.2217/17460794.4.1.47</pub-id>
<pub-id pub-id-type="pmid">19884976</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="journal">
<name>
<surname>van Elden</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Nijhuis</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Schipper</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schuurman</surname>
<given-names>R</given-names>
</name>
<name>
<surname>van Loon</surname>
<given-names>AM</given-names>
</name>
<article-title>Simultaneous detection of influenza viruses A and B using real-time quantitative PCR</article-title>
<source>J Clin Microbiol</source>
<year>2001</year>
<volume>39</volume>
<fpage>196</fpage>
<lpage>200</lpage>
<pub-id pub-id-type="doi">10.1128/JCM.39.1.196-200.2001</pub-id>
<pub-id pub-id-type="pmid">11136770</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<name>
<surname>Coiras</surname>
<given-names>MT</given-names>
</name>
<name>
<surname>Pérez-Breña</surname>
<given-names>P</given-names>
</name>
<name>
<surname>García</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Casas</surname>
<given-names>I</given-names>
</name>
<article-title>Simultaneous detection of influenza A, B, and C viruses, respiratory syncytial virus, and adenoviruses in clinical samples by multiplex reverse transcription nested-PCR assay</article-title>
<source>J Med Virol</source>
<year>2003</year>
<volume>69</volume>
<fpage>132</fpage>
<lpage>144</lpage>
<pub-id pub-id-type="doi">10.1002/jmv.10255</pub-id>
<pub-id pub-id-type="pmid">12436489</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<name>
<surname>Farris</surname>
<given-names>JS</given-names>
</name>
<article-title>Estimating phylogenetic trees from distance matrices</article-title>
<source>Am Nat</source>
<year>1972</year>
<volume>106</volume>
<fpage>645</fpage>
<lpage>668</lpage>
<pub-id pub-id-type="doi">10.1086/282802</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="journal">
<name>
<surname>Yurovsky</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Moret</surname>
<given-names>BM</given-names>
</name>
<article-title>FluReF, an automated flu virus reassortment finder based on phylogenetic trees</article-title>
<source>BMC Genomics</source>
<year>2011</year>
<volume>12</volume>
<issue>Suppl 2</issue>
<fpage>S3</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-12-S2-S3</pub-id>
<pub-id pub-id-type="pmid">21989112</pub-id>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal">
<name>
<surname>Rabadan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Levine</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Krasnitz</surname>
<given-names>M</given-names>
</name>
<article-title>Non-random reassortment in human influenza A viruses</article-title>
<source>Influenza Other Respi Viruses</source>
<year>2008</year>
<volume>2</volume>
<fpage>9</fpage>
<lpage>22</lpage>
<pub-id pub-id-type="doi">10.1111/j.1750-2659.2007.00030.x</pub-id>
<pub-id pub-id-type="pmid">19453489</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="book">
<name>
<surname>Nagarajan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Kingsford</surname>
<given-names>C</given-names>
</name>
<source>Uncovering Genomic Reassortments Among Influenza Strains by Enumerating Maximal Bicliques</source>
<year>2008</year>
<publisher-name>Biomedicine: IEEE International Conference on Bioinformatics and Biomedicine</publisher-name>
<fpage>223</fpage>
<lpage>230</lpage>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal">
<name>
<surname>Wan</surname>
<given-names>XF</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Holton</surname>
<given-names>SB</given-names>
</name>
<name>
<surname>Desmone</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Shyu</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Emch</surname>
<given-names>ME</given-names>
</name>
<article-title>Computational Identification of Reassortments in Avian Influenza Viruses</article-title>
<source>Avian Dis</source>
<year>2007</year>
<volume>51</volume>
<fpage>434</fpage>
<lpage>439</lpage>
<pub-id pub-id-type="doi">10.1637/7625-042706R1.1</pub-id>
<pub-id pub-id-type="pmid">17494602</pub-id>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal">
<name>
<surname>Suzuki</surname>
<given-names>Y</given-names>
</name>
<article-title>A phylogenetic approach to detecting reassortments in viruses with segmented genomes</article-title>
<source>Gene</source>
<year>2010</year>
<volume>464</volume>
<fpage>11</fpage>
<lpage>16</lpage>
<pub-id pub-id-type="doi">10.1016/j.gene.2010.05.002</pub-id>
<pub-id pub-id-type="pmid">20546849</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<name>
<surname>Bokhari</surname>
<given-names>SH</given-names>
</name>
<name>
<surname>Janies</surname>
<given-names>DA</given-names>
</name>
<article-title>Reassortment networks for investigating the evolution of segmented viruses</article-title>
<source>IEEE/ACM Trans Comput Biol Bioinform</source>
<year>2010</year>
<volume>7</volume>
<fpage>288</fpage>
<lpage>298</lpage>
<pub-id pub-id-type="pmid">20431148</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<name>
<surname>Gupta</surname>
<given-names>RS</given-names>
</name>
<article-title>Protein phylogenies and signature sequences: A reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes</article-title>
<source>Microbiol Mol Biol Rev</source>
<year>1998</year>
<volume>62</volume>
<fpage>1435</fpage>
<lpage>1491</lpage>
<pub-id pub-id-type="pmid">9841678</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal">
<name>
<surname>Schwahn</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Downard</surname>
<given-names>KM</given-names>
</name>
<article-title>Subtyping of the influenza virus by high resolution mass spectrometry</article-title>
<source>Anal Chem</source>
<year>2009</year>
<volume>81</volume>
<fpage>3500</fpage>
<lpage>3506</lpage>
<pub-id pub-id-type="doi">10.1021/ac900026f</pub-id>
<pub-id pub-id-type="pmid">19402721</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal">
<name>
<surname>Schwahn</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Downard</surname>
<given-names>KM</given-names>
</name>
<article-title>Signature peptides of influenza nucleoprotein for the typing and subtyping of the virus by high resolution mass spectrometry</article-title>
<source>Analyst</source>
<year>2009</year>
<volume>134</volume>
<fpage>2253</fpage>
<lpage>2261</lpage>
<pub-id pub-id-type="doi">10.1039/b912234f</pub-id>
<pub-id pub-id-type="pmid">19838412</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<name>
<surname>Schwahn</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Downard</surname>
<given-names>KM</given-names>
</name>
<article-title>Typing of human and animal strains of influenza virus with conserved signature peptides of matrix M1 protein by high resolution mass spectrometry</article-title>
<source>J Virol Methods</source>
<year>2010</year>
<volume>165</volume>
<fpage>178</fpage>
<lpage>185</lpage>
<pub-id pub-id-type="doi">10.1016/j.jviromet.2010.01.015</pub-id>
<pub-id pub-id-type="pmid">20117137</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal">
<name>
<surname>Schwahn</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Downard</surname>
<given-names>KM</given-names>
</name>
<article-title>Rapid typing and subtyping of vaccine strains of the influenza virus with high resolution mass spectrometry</article-title>
<source>Eur</source>
<year>2010</year>
<volume>16</volume>
<fpage>321</fpage>
<lpage>329</lpage>
<pub-id pub-id-type="doi">10.1255/ejms.1056</pub-id>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="journal">
<name>
<surname>Schwahn</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Downard</surname>
<given-names>KM</given-names>
</name>
<article-title>Proteotyping to establish the lineage of type A H1N1 and type B human influenza virus</article-title>
<source>J Virol Methods</source>
<year>2010</year>
<volume>171</volume>
<fpage>117</fpage>
<lpage>122</lpage>
<pub-id pub-id-type="pmid">20970456</pub-id>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="journal">
<name>
<surname>Wong</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Schwahn</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Downard</surname>
<given-names>KM</given-names>
</name>
<article-title>FluTyper - an algorithm for automated typing and subtyping of the influenza virus from high resolution mass spectral data</article-title>
<source>BMC Bioinformatics</source>
<year>2010</year>
<volume>11</volume>
<fpage>266</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-11-266</pub-id>
<pub-id pub-id-type="pmid">20482883</pub-id>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<name>
<surname>Schwahn</surname>
<given-names>AB</given-names>
</name>
<etal></etal>
<article-title>Rapid differentiation of seasonal and pandemic H1N1 influenza through proteotyping of viral neuraminidase with mass spectrometry</article-title>
<source>Anal Chem</source>
<year>2010</year>
<volume>82</volume>
<fpage>4584</fpage>
<lpage>4590</lpage>
<pub-id pub-id-type="doi">10.1021/ac100594j</pub-id>
<pub-id pub-id-type="pmid">20443622</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal">
<name>
<surname>Ha</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Schwahn</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Downard</surname>
<given-names>KM</given-names>
</name>
<article-title>Proteotyping to Establish Gene Origin within Reassortant Influenza Viruses</article-title>
<source>PLoS One</source>
<year>2011</year>
<volume>6</volume>
<fpage>e15771</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0015771</pub-id>
<pub-id pub-id-type="pmid">21305059</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal">
<name>
<surname>Ha</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Downard</surname>
<given-names>KM</given-names>
</name>
<article-title>Evolution of H5N1 influenza virus through proteotyping of hemagglutinin with high resolution mass spectrometry</article-title>
<source>Analyst</source>
<year>2011</year>
<volume>136</volume>
<fpage>3259</fpage>
<lpage>3267</lpage>
<pub-id pub-id-type="doi">10.1039/c1an15354d</pub-id>
<pub-id pub-id-type="pmid">21717003</pub-id>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="journal">
<name>
<surname>Bao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Bolotov</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Dernovoy</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kiryutin</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Zaslavsky</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Tatusova</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Ostell</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>D</given-names>
</name>
<article-title>The influenza virus resource at the national center for biotechnology information</article-title>
<source>J Virol</source>
<year>2008</year>
<volume>82</volume>
<fpage>596</fpage>
<lpage>601</lpage>
<pub-id pub-id-type="doi">10.1128/JVI.02005-07</pub-id>
<pub-id pub-id-type="pmid">17942553</pub-id>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal">
<name>
<surname>He</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>RZ</given-names>
</name>
<name>
<surname>Tam</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>W</given-names>
</name>
<article-title>Optimization-based peptide mass fingerprinting for protein mixture identification</article-title>
<source>J Comput Biol</source>
<year>2010</year>
<volume>17</volume>
<fpage>221</fpage>
<lpage>235</lpage>
<pub-id pub-id-type="doi">10.1089/cmb.2009.0160</pub-id>
<pub-id pub-id-type="pmid">20377442</pub-id>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<name>
<surname>Flegal</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Haran</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>GL</given-names>
</name>
<article-title>Markov chain Monte Carlo: can we trust the third significant figure?</article-title>
<source>Stat Sci</source>
<year>2008</year>
<volume>23</volume>
<fpage>250</fpage>
<lpage>260</lpage>
<pub-id pub-id-type="doi">10.1214/08-STS257</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal">
<name>
<surname>Zhang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Chait</surname>
<given-names>BT</given-names>
</name>
<article-title>ProFound: an expert system for protein identification using mass spectrometric peptide mapping information</article-title>
<source>Anal Chem</source>
<year>2000</year>
<volume>72</volume>
<fpage>2482</fpage>
<lpage>2489</lpage>
<pub-id pub-id-type="doi">10.1021/ac991363o</pub-id>
<pub-id pub-id-type="pmid">10857624</pub-id>
</mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="journal">
<name>
<surname>McDonald</surname>
<given-names>WH</given-names>
</name>
<name>
<surname>Tabb</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Sadygov</surname>
<given-names>RG</given-names>
</name>
<name>
<surname>MacCoss</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Venable</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Graumann</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Cociorva</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Yates</surname>
<given-names>JR</given-names>
<suffix>III</suffix>
</name>
<article-title>MS1, MS2, and SQT - three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications</article-title>
<source>Rapid Commun Mass Spectrom</source>
<year>2004</year>
<volume>18</volume>
<fpage>2162</fpage>
<lpage>2168</lpage>
<pub-id pub-id-type="doi">10.1002/rcm.1603</pub-id>
<pub-id pub-id-type="pmid">15317041</pub-id>
</mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="journal">
<name>
<surname>Hoopmann</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Finney</surname>
<given-names>GL</given-names>
</name>
<name>
<surname>MacCoss</surname>
<given-names>MJ</given-names>
</name>
<article-title>High-speed data reduction, feature detection, and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry</article-title>
<source>Anal Chem</source>
<year>2007</year>
<volume>79</volume>
<fpage>5620</fpage>
<lpage>5632</lpage>
<pub-id pub-id-type="doi">10.1021/ac0700833</pub-id>
<pub-id pub-id-type="pmid">17580982</pub-id>
</mixed-citation>
</ref>
<ref id="B42">
<mixed-citation publication-type="journal">
<name>
<surname>Price</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Dehal</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Arkin</surname>
<given-names>A</given-names>
</name>
<article-title>FastTree 2 - approximately maximum-likelihood trees for large alignments</article-title>
<source>PLoS One</source>
<year>2010</year>
<volume>5</volume>
<fpage>e9490</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0009490</pub-id>
<pub-id pub-id-type="pmid">20224823</pub-id>
</mixed-citation>
</ref>
<ref id="B43">
<mixed-citation publication-type="journal">
<name>
<surname>Edgar</surname>
<given-names>RC</given-names>
</name>
<article-title>MUSCLE: multiple sequence alignment with high accuracy and high throughput</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<fpage>1792</fpage>
<lpage>1797</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkh340</pub-id>
<pub-id pub-id-type="pmid">15034147</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/H2N2V1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000519 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000519 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    H2N2V1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3505172
   |texte=   FluShuffle and FluResort: new algorithms to identify reassorted strains of the influenza virus by mass spectrometry
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:22906155" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a H2N2V1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 14 19:59:40 2020. Site generation: Thu Mar 25 15:38:26 2021