Serveur d'exploration H2N2

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Low-dimensional clustering detects incipient dominant influenza strain clusters

Identifieur interne : 000C25 ( Pmc/Corpus ); précédent : 000C24; suivant : 000C26

Low-dimensional clustering detects incipient dominant influenza strain clusters

Auteurs : Jiankui He ; Michael W. Deem

Source :

RBID : PMC:2978544

Abstract

Influenza has been circulating in the human population and has caused three pandemics in the last century (1918 H1N1, 1957 H2N2 and 1968 H3N2). The 2009 A(H1N1) was classified by World Health Organization as the fourth pandemic. Influenza has a high evolution rate, which makes vaccine design challenging. We here consider an approach for early detection of new dominant strains. By clustering the 2009 A(H1N1) sequence data, we found two main clusters. We then define a metric to detect the emergence of dominant strains. We show on historical H3N2 data that this method is able to identify a cluster around an incipient dominant strain before it becomes dominant. For example, for H3N2 as of 30 March 2009, the method detects the cluster for the new A/British Columbia/RV1222/2009 strain. This strain detection tool would appear to be useful for annual influenza vaccine selection.


Url:
DOI: 10.1093/protein/gzq078
PubMed: 21036781
PubMed Central: 2978544

Links to Exploration step

PMC:2978544

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Low-dimensional clustering detects incipient dominant influenza strain clusters</title>
<author>
<name sortKey="He, Jiankui" sort="He, Jiankui" uniqKey="He J" first="Jiankui" last="He">Jiankui He</name>
<affiliation>
<nlm:aff id="af1">
<addr-line>Department of Physics & Astronomy</addr-line>
,
<institution>Rice University</institution>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Deem, Michael W" sort="Deem, Michael W" uniqKey="Deem M" first="Michael W." last="Deem">Michael W. Deem</name>
<affiliation>
<nlm:aff id="af1">
<addr-line>Department of Physics & Astronomy</addr-line>
,
<institution>Rice University</institution>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="af2">
<addr-line>Department of Bioengineering</addr-line>
,
<institution>Rice University</institution>
,
<addr-line>Houston, TX</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">21036781</idno>
<idno type="pmc">2978544</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2978544</idno>
<idno type="RBID">PMC:2978544</idno>
<idno type="doi">10.1093/protein/gzq078</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000C25</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000C25</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Low-dimensional clustering detects incipient dominant influenza strain clusters</title>
<author>
<name sortKey="He, Jiankui" sort="He, Jiankui" uniqKey="He J" first="Jiankui" last="He">Jiankui He</name>
<affiliation>
<nlm:aff id="af1">
<addr-line>Department of Physics & Astronomy</addr-line>
,
<institution>Rice University</institution>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Deem, Michael W" sort="Deem, Michael W" uniqKey="Deem M" first="Michael W." last="Deem">Michael W. Deem</name>
<affiliation>
<nlm:aff id="af1">
<addr-line>Department of Physics & Astronomy</addr-line>
,
<institution>Rice University</institution>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="af2">
<addr-line>Department of Bioengineering</addr-line>
,
<institution>Rice University</institution>
,
<addr-line>Houston, TX</addr-line>
,
<country>USA</country>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Protein Engineering, Design and Selection</title>
<idno type="ISSN">1741-0126</idno>
<idno type="eISSN">1741-0134</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Influenza has been circulating in the human population and has caused three pandemics in the last century (1918 H1N1, 1957 H2N2 and 1968 H3N2). The 2009 A(H1N1) was classified by World Health Organization as the fourth pandemic. Influenza has a high evolution rate, which makes vaccine design challenging. We here consider an approach for early detection of new dominant strains. By clustering the 2009 A(H1N1) sequence data, we found two main clusters. We then define a metric to detect the emergence of dominant strains. We show on historical H3N2 data that this method is able to identify a cluster around an incipient dominant strain before it becomes dominant. For example, for H3N2 as of 30 March 2009, the method detects the cluster for the new A/British Columbia/RV1222/2009 strain. This strain detection tool would appear to be useful for annual influenza vaccine selection.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Bao, Y" uniqKey="Bao Y">Y. Bao</name>
</author>
<author>
<name sortKey="Bolotov, P" uniqKey="Bolotov P">P. Bolotov</name>
</author>
<author>
<name sortKey="Dernovoy, D" uniqKey="Dernovoy D">D. Dernovoy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cox, N" uniqKey="Cox N">N. Cox</name>
</author>
<author>
<name sortKey="Bender, C" uniqKey="Bender C">C. Bender</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deem, M" uniqKey="Deem M">M. Deem</name>
</author>
<author>
<name sortKey="Pan, K" uniqKey="Pan K">K. Pan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Domingo, E" uniqKey="Domingo E">E. Domingo</name>
</author>
<author>
<name sortKey="Holland, J" uniqKey="Holland J">J. Holland</name>
</author>
<author>
<name sortKey="Biebricher, C" uniqKey="Biebricher C">C. Biebricher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Drake, J W" uniqKey="Drake J">J.W. Drake</name>
</author>
<author>
<name sortKey="Holland, J J" uniqKey="Holland J">J.J. Holland</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Everitt, B S" uniqKey="Everitt B">B.S. Everitt</name>
</author>
<author>
<name sortKey="Landau, S" uniqKey="Landau S">S. Landau</name>
</author>
<author>
<name sortKey="Leese, M" uniqKey="Leese M">M. Leese</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ferguson, N" uniqKey="Ferguson N">N. Ferguson</name>
</author>
<author>
<name sortKey="Galvani, A" uniqKey="Galvani A">A. Galvani</name>
</author>
<author>
<name sortKey="Bush, R" uniqKey="Bush R">R. Bush</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fitch, W M" uniqKey="Fitch W">W.M. Fitch</name>
</author>
<author>
<name sortKey="Bush, R M" uniqKey="Bush R">R.M. Bush</name>
</author>
<author>
<name sortKey="Bender, C A" uniqKey="Bender C">C.A. Bender</name>
</author>
<author>
<name sortKey="Cox, N J" uniqKey="Cox N">N.J. Cox</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fraser, C" uniqKey="Fraser C">C. Fraser</name>
</author>
<author>
<name sortKey="Donnelly, C A" uniqKey="Donnelly C">C.A. Donnelly</name>
</author>
<author>
<name sortKey="Cauchemez, S" uniqKey="Cauchemez S">S. Cauchemez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garten, R J" uniqKey="Garten R">R.J. Garten</name>
</author>
<author>
<name sortKey="Davis, C T" uniqKey="Davis C">C.T. Davis</name>
</author>
<author>
<name sortKey="Russell, C A" uniqKey="Russell C">C.A. Russell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghedin, E" uniqKey="Ghedin E">E. Ghedin</name>
</author>
<author>
<name sortKey="Sengamalay, N A" uniqKey="Sengamalay N">N.A. Sengamalay</name>
</author>
<author>
<name sortKey="Shumway, M" uniqKey="Shumway M">M. Shumway</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gupta, S" uniqKey="Gupta S">S. Gupta</name>
</author>
<author>
<name sortKey="Ferguson, N" uniqKey="Ferguson N">N. Ferguson</name>
</author>
<author>
<name sortKey="Anderson, R" uniqKey="Anderson R">R. Anderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gupta, V" uniqKey="Gupta V">V. Gupta</name>
</author>
<author>
<name sortKey="Earl, D J" uniqKey="Earl D">D.J. Earl</name>
</author>
<author>
<name sortKey="Deem, M" uniqKey="Deem M">M. Deem</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hak, E" uniqKey="Hak E">E. Hak</name>
</author>
<author>
<name sortKey="Nordin, J" uniqKey="Nordin J">J. Nordin</name>
</author>
<author>
<name sortKey="Wei, F" uniqKey="Wei F">F. Wei</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lapedes, A" uniqKey="Lapedes A">A. Lapedes</name>
</author>
<author>
<name sortKey="Farber, R" uniqKey="Farber R">R. Farber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nelson, M I" uniqKey="Nelson M">M.I. Nelson</name>
</author>
<author>
<name sortKey="Holmes, E C" uniqKey="Holmes E">E.C. Holmes</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pan, K" uniqKey="Pan K">K. Pan</name>
</author>
<author>
<name sortKey="Subieta, K" uniqKey="Subieta K">K. Subieta</name>
</author>
<author>
<name sortKey="Deem, M" uniqKey="Deem M">M. Deem</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Plotkin, J B" uniqKey="Plotkin J">J.B. Plotkin</name>
</author>
<author>
<name sortKey="Dushoff, J" uniqKey="Dushoff J">J. Dushoff</name>
</author>
<author>
<name sortKey="Levin, S A" uniqKey="Levin S">S.A. Levin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Russell, C A" uniqKey="Russell C">C.A. Russell</name>
</author>
<author>
<name sortKey="Jones, T C" uniqKey="Jones T">T.C. Jones</name>
</author>
<author>
<name sortKey="Barr, I G" uniqKey="Barr I">I.G. Barr</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Russell, C A" uniqKey="Russell C">C.A. Russell</name>
</author>
<author>
<name sortKey="Jones, T C" uniqKey="Jones T">T.C. Jones</name>
</author>
<author>
<name sortKey="Barr, I G" uniqKey="Barr I">I.G. Barr</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Santamaria, C" uniqKey="Santamaria C">C. Santamaria</name>
</author>
<author>
<name sortKey="Urue, A" uniqKey="Urue A">A. Urue</name>
</author>
<author>
<name sortKey="Videla, C" uniqKey="Videla C">C. Videla</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Skowronski, D" uniqKey="Skowronski D">D. Skowronski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, D J" uniqKey="Smith D">D.J. Smith</name>
</author>
<author>
<name sortKey="Lapedes, A S" uniqKey="Lapedes A">A.S. Lapedes</name>
</author>
<author>
<name sortKey="De Jong, J C" uniqKey="De Jong J">J.C. de Jong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, G" uniqKey="Smith G">G. Smith</name>
</author>
<author>
<name sortKey="Vijaykrishna, D" uniqKey="Vijaykrishna D">D. Vijaykrishna</name>
</author>
<author>
<name sortKey="Bahl, J" uniqKey="Bahl J">J. Bahl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Turner, J L" uniqKey="Turner J">J.L. Turner</name>
</author>
<author>
<name sortKey="Fielding, J E" uniqKey="Fielding J">J.E. Fielding</name>
</author>
<author>
<name sortKey="Clothier, H J" uniqKey="Clothier H">H.J. Clothier</name>
</author>
<author>
<name sortKey="Kelly, H A" uniqKey="Kelly H">H.A. Kelly</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Webster, R G" uniqKey="Webster R">R.G. Webster</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, H" uniqKey="Zhou H">H. Zhou</name>
</author>
<author>
<name sortKey="Pophale, R S" uniqKey="Pophale R">R.S. Pophale</name>
</author>
<author>
<name sortKey="Deem, M W" uniqKey="Deem M">M.W. Deem</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Protein Eng Des Sel</journal-id>
<journal-id journal-id-type="publisher-id">proeng</journal-id>
<journal-id journal-id-type="hwp">proeng</journal-id>
<journal-title-group>
<journal-title>Protein Engineering, Design and Selection</journal-title>
</journal-title-group>
<issn pub-type="ppub">1741-0126</issn>
<issn pub-type="epub">1741-0134</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">21036781</article-id>
<article-id pub-id-type="pmc">2978544</article-id>
<article-id pub-id-type="doi">10.1093/protein/gzq078</article-id>
<article-id pub-id-type="publisher-id">gzq078</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Original Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Low-dimensional clustering detects incipient dominant influenza strain clusters</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>He</surname>
<given-names>Jiankui</given-names>
</name>
<xref ref-type="aff" rid="af1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Deem</surname>
<given-names>Michael W.</given-names>
</name>
<xref ref-type="aff" rid="af1">1</xref>
<xref ref-type="aff" rid="af2">2</xref>
<xref ref-type="corresp" rid="cor3">3</xref>
</contrib>
</contrib-group>
<aff id="af1">
<label>1</label>
<addr-line>Department of Physics & Astronomy</addr-line>
,
<institution>Rice University</institution>
</aff>
<aff id="af2">
<label>2</label>
<addr-line>Department of Bioengineering</addr-line>
,
<institution>Rice University</institution>
,
<addr-line>Houston, TX</addr-line>
,
<country>USA</country>
</aff>
<author-notes>
<corresp id="cor3">
<label>3</label>
To whom correspondence should be addressed. E-mail:
<email>mwdeem@rice.edu</email>
</corresp>
<fn fn-type="con">
<p>Edited by Devarajan Thirumalai</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<month>12</month>
<year>2010</year>
</pub-date>
<pub-date pub-type="epub">
<day>29</day>
<month>10</month>
<year>2010</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>29</day>
<month>10</month>
<year>2010</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>23</volume>
<issue>12</issue>
<fpage>935</fpage>
<lpage>946</lpage>
<history>
<date date-type="received">
<day>1</day>
<month>9</month>
<year>2010</year>
</date>
<date date-type="rev-recd">
<day>1</day>
<month>9</month>
<year>2010</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>9</month>
<year>2010</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author 2010. Published by Oxford University Press.</copyright-statement>
<copyright-year>2010</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by-nc/2.5/">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/2.5">http://creativecommons.org/licenses/by-nc/2.5</ext-link>
), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="gzq078.pdf"></self-uri>
<abstract>
<p>Influenza has been circulating in the human population and has caused three pandemics in the last century (1918 H1N1, 1957 H2N2 and 1968 H3N2). The 2009 A(H1N1) was classified by World Health Organization as the fourth pandemic. Influenza has a high evolution rate, which makes vaccine design challenging. We here consider an approach for early detection of new dominant strains. By clustering the 2009 A(H1N1) sequence data, we found two main clusters. We then define a metric to detect the emergence of dominant strains. We show on historical H3N2 data that this method is able to identify a cluster around an incipient dominant strain before it becomes dominant. For example, for H3N2 as of 30 March 2009, the method detects the cluster for the new A/British Columbia/RV1222/2009 strain. This strain detection tool would appear to be useful for annual influenza vaccine selection.</p>
</abstract>
<kwd-group>
<kwd>clustering</kwd>
<kwd>H1N1</kwd>
<kwd>H3N2</kwd>
<kwd>influenza</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>The recent outbreak of 2009 A(H1N1) caused immediate international attention (
<xref ref-type="bibr" rid="GZQ078C3">Deem and Pan, 2009</xref>
;
<xref ref-type="bibr" rid="GZQ078C10">Fraser
<italic>et al.</italic>
, 2009</xref>
;
<xref ref-type="bibr" rid="GZQ078C11">Garten
<italic>et al.</italic>
, 2009</xref>
;
<xref ref-type="bibr" rid="GZQ078C27">Smith
<italic>et al.</italic>
, 2009</xref>
). This new 2009 A(H1N1) virus contains a combination of gene segments from swine and human influenza viruses (
<xref ref-type="bibr" rid="GZQ078C10">Fraser
<italic>et al.</italic>
, 2009</xref>
;
<xref ref-type="bibr" rid="GZQ078C11">Garten
<italic>et al.</italic>
, 2009</xref>
). Confirmed infections reached 270 000 globally as of September 2009 (
<xref ref-type="bibr" rid="GZQ078C46">World Health Organization, 2009b</xref>
). The novel 2009 A(H1N1) strain was defined as a pandemic strain by the World health Organization (WHO) in 2009 (
<xref ref-type="bibr" rid="GZQ078C48">World Health Organization, 2009d</xref>
), and was the epidemic strain in the 2009 Northern winter.</p>
<p>Influenza viruses are hyper-mutating viruses. It has been estimated that the nucleotide mutation rate per genome per replication is approximately 0.76 (
<xref ref-type="bibr" rid="GZQ078C5">Drake and Holland, 1999</xref>
). Influenza viruses escape the human immune system by continual antigenic drift and shift (
<xref ref-type="bibr" rid="GZQ078C9">Fitch
<italic>et al.</italic>
, 1997</xref>
;
<xref ref-type="bibr" rid="GZQ078C13">Gupta
<italic>et al.</italic>
, 1998</xref>
;
<xref ref-type="bibr" rid="GZQ078C29">Webster, 1998</xref>
;
<xref ref-type="bibr" rid="GZQ078C8">Ferguson
<italic>et al.</italic>
, 2003</xref>
;
<xref ref-type="bibr" rid="GZQ078C12">Ghedin
<italic>et al.</italic>
, 2005</xref>
;
<xref ref-type="bibr" rid="GZQ078C17">Nelson and Holmes, 2007</xref>
). The quasispecies nature of influenza viruses makes the strain structure complex (
<xref ref-type="bibr" rid="GZQ078C4">Domingo
<italic>et al.</italic>
, 2002</xref>
). Usually, there is one or a few dominant influenza strains circulating in the population for each flu season. The flu vaccine is most effective when it matches this dominant circulating strain (
<xref ref-type="bibr" rid="GZQ078C15">Hak
<italic>et al.</italic>
, 2002</xref>
;
<xref ref-type="bibr" rid="GZQ078C14">Gupta
<italic>et al.</italic>
, 2006</xref>
). The degree to which immunity induced by a vaccine protects against a different viral strain is determined by the antigenic distance between the vaccine and the virus. Due to evolution of the antigenic regions of the influenza virus, the composition of the flu vaccine is typically modified annually (
<xref ref-type="bibr" rid="GZQ078C21">Russell
<italic>et al.</italic>
, 2008</xref>
a). However, since the influenza strains used in the flu vaccine are decided 6 months before the flu season, a mismatch between the vaccine strain and dominant circulating strain may occur if the virus evolves significantly. Such a situation arose for the H3N2 virus in the 2009–2010 flu season, when A/British Columbia/RV1222/2009 emerged in the early spring (
<xref ref-type="bibr" rid="GZQ078C24">Seasonal influenza, 2009</xref>
;
<xref ref-type="bibr" rid="GZQ078C25">Skowronski, 2009</xref>
). Accurate early prediction of the dominant circulating strain is an essential and important task in influenza research.</p>
<p>There are several ways to estimate the flu vaccine effectiveness.
<xref ref-type="bibr" rid="GZQ078C14">Gupta
<italic>et al.</italic>
(2006)</xref>
proposed
<italic>p</italic>
<sub>epitope</sub>
as a measure of antigenic distance between influenza A vaccine and circulating strains. The hemagglutinin protein has five epitopes. The dominant epitope for a particular circulating strain in a particular season was taken as that which had the largest fractional change in amino acid sequence relative to the vaccine strain. The value of
<italic>p</italic>
<sub>epitope</sub>
is defined as the fraction of number of amino acid differences in the dominant epitope to total number of amino acids in the dominant epitope. The antigenic distance between the vaccine strain and the circulating strain is quantified by
<italic>p</italic>
<sub>epitope</sub>
. By a metaanalysis of historical vaccine efficacy data from over 50 publications,
<xref ref-type="bibr" rid="GZQ078C14">Gupta
<italic>et al.</italic>
(2006)</xref>
showed in a metaanalysis that the
<italic>p</italic>
<sub>epitope</sub>
between vaccine strain and circulating strain correlates well with the vaccine efficacy, with
<italic>R</italic>
<sup>2</sup>
> 0.8.</p>
<p>Understanding the evolution of influenza viruses has benefited from phylogenetic reconstructions of the hemagglutinin protein evolution (
<xref ref-type="bibr" rid="GZQ078C8">Ferguson
<italic>et al.</italic>
, 2003</xref>
;
<xref ref-type="bibr" rid="GZQ078C22">Russell
<italic>et al.</italic>
, 2008b</xref>
). In an alternative approach,
<xref ref-type="bibr" rid="GZQ078C16">Lapedes and Farber (2001)</xref>
, followed by
<xref ref-type="bibr" rid="GZQ078C26">Smith
<italic>et al.</italic>
(2004)</xref>
, applied a technique called multidimensional scaling to study antigenic evolution of influenza.
<xref ref-type="bibr" rid="GZQ078C20">Plotkin
<italic>et al.</italic>
(2002)</xref>
clustered hemagglutinin protein sequences using the single-linkage clustering algorithm and found that influenza viruses group into clusters.</p>
<p>Here, we present a low-dimensional clustering method that can detect the cluster containing an incipient dominant strain for an upcoming flu season before the strain becomes dominant. The method builds upon the dimensional projection technique used by
<xref ref-type="bibr" rid="GZQ078C16">Lapedes and Farber (2001)</xref>
and
<xref ref-type="bibr" rid="GZQ078C26">Smith
<italic>et al.</italic>
(2004)</xref>
to characterize hemagglutination inhibition (HI) data. Importantly, the present method requires only sequence data, unlike the approach of
<xref ref-type="bibr" rid="GZQ078C16">Lapedes and Farber (2001)</xref>
and
<xref ref-type="bibr" rid="GZQ078C26">Smith
<italic>et al.</italic>
(2004)</xref>
, which require ferret HI assay data animal data. In this paper, we first study the evolution of 2009 A(H1N1) by an evolutionary path map which leads to a suggestion for the H1N1 vaccine strain. Then, we introduce the low-dimensional protein sequence clustering method. We propose an influenza vaccine selection procedure based on this sequence clustering. The procedure is demonstrated and tested in detail using historical data. We show the performance of the method to predict the dominant H3N2 strain in an upcoming flu season using data solely from before the flu season, on data since 1996. We compare the results to those from existing methods since 1996. In the discussion section, we discuss the relationship between the protein sequence clustering method and previous approaches. We discuss the false positive rate, as well as other challenges.</p>
</sec>
<sec sec-type="results" id="s2">
<title>Results</title>
<sec id="s2a">
<title>Evolutionary path of 2009 A(H1N1) influenza</title>
<p>We first construct the directional evolutionary path for the 2009 A(H1N1) influenza. We use high-resolution data in sequence, time and world spatial coordinate to construct this evolutionary relationship.</p>
<p>Since its first detection, the 2009 A(H1N1) virus has been extensively sequenced (
<xref ref-type="bibr" rid="GZQ078C10">Fraser
<italic>et al.</italic>
, 2009</xref>
;
<xref ref-type="bibr" rid="GZQ078C11">Garten
<italic>et al.</italic>
, 2009</xref>
). By 1 May 2009, the number of confirmed cases reported by WHO was 333 (
<xref ref-type="bibr" rid="GZQ078C46">World Health Organization, 2009</xref>
b). At the same time, the sequenced hemagglutinin proteins (HA) available in NCBI Influenza Resources Database were 312 (
<xref ref-type="bibr" rid="GZQ078C1">Bao
<italic>et al.</italic>
, 2008</xref>
); that is to say most of the confirmed cases at that time were sequenced. On 1 July 2009, the ratio of sequenced HA protein to confirmed cases by WHO was 1039/77201 (
<xref ref-type="bibr" rid="GZQ078C46">World Health Organization, 2009b</xref>
), a number which is still much larger than that for seasonal flu. In addition, the Influenza Resources Database contains the date of collection of each 2009 A(H1N1) virus strain. We reconstruct the evolutionary history of swine flu viruses with the following procedure. If strain B is mutated from strain A, we term strain A ‘founder’ and strain B ‘F1’. We align the HA proteins of all 2009 A(H1N1) strains. Then, for each strain, we find its founder strain based on the following four criteria: (i) the founder strain should appear earlier than the strain, as judged by collection date; (ii) the founder strain should have only one amino acid difference in the HA1 protein relative to the F1 strain; (iii) the founder should also have the most similar nucleotide sequence relative to F1; and (iv) the founder strain should have a large number of identical copies circulating in human population, as approximated by the number of different strains with identical HA sequences in the Influenza Resources Database. By applying these four criteria to 2009 A(H1N1) influenza, we construct the directional evolutionary path map, as shown in Fig. 
<xref ref-type="fig" rid="GZQ078F1">1</xref>
. We can see two clusters: one around A/New York/19/2009 (#28) and another one around A/Texas/05/2009 (#12). Most new strains are from the Northern hemisphere, and strains from the Southern hemisphere are mainly located at the edge of the map, such as strains #96, #120 and #126. That the Southern hemisphere strains appear at the boundary of the figure provides a self-consistency check of the validity of the assumptions entering the construction of this figure. Geographically, we see many founder to F1 links are from the USA and Mexico to other countries, but we rarely see founder to F1 links that are from other countries to the USA and Mexico, or from other countries to other countries except the USA and Mexico (see Materials and methods). We also found that strains with more F1 in Fig. 
<xref ref-type="fig" rid="GZQ078F1">1</xref>
are more frequently seen in the human population. For example, in the Influenza Resources Database, we found 153 strains to be identical with A/New York/19/2009, which has 29 F1 strains, and 120 strains to be identical with A/Texas/05/2009, which has 24 F1 strains. We can see in Fig. 
<xref ref-type="fig" rid="GZQ078F1">1</xref>
that A/Texas/05/2009 is at the very upstream of the map, with downward connections to most of the other strains by direct or two-step links. This result agrees with the US Food and Drug Administration (
<xref ref-type="bibr" rid="GZQ078C7">FDA, 2009</xref>
) recommendation of A/Texas/05/2009 as a vaccination strain. The alternative vaccine strain A/California/7/2009 (#7) has fewer F1 strains and it is not located at the center of the network.
<fig id="GZQ078F1" position="float">
<label>Fig. 1</label>
<caption>
<p>The evolutionary path of 2009 A(H1N1) influenza. Strain #1: A/California/05/2009. Strain #2: A/California/04/2009. Strain #7: A/California/07/2009. Strain #12: A/Texas/05/2009. Strain #28: A/New York/19/2009. For complete strain names, see
<ext-link ext-link-type="uri" xlink:href="http://peds.oxfordjournals.org/cgi/content/full/gzq078/DC1">Supplementary data</ext-link>
. Strains from the Northern and Southern hemisphere are shown as red dots and blue dots, respectively. One branch represents one substitution in the amino acid sequence.</p>
</caption>
<graphic xlink:href="gzq07801"></graphic>
</fig>
</p>
</sec>
<sec id="s2b">
<title>Low-dimensional clustering</title>
<p>We use a low-dimensional clustering method to visualize the antigenic distance matrix of the viruses. We use a statistical tool called ‘multidimensional scaling’ (
<xref ref-type="bibr" rid="GZQ078C6">Everitt
<italic>et al.</italic>
, 2001</xref>
). This method was used by
<xref ref-type="bibr" rid="GZQ078C16">Lapedes and Farber (2001)</xref>
and
<xref ref-type="bibr" rid="GZQ078C26">Smith
<italic>et al.</italic>
(2004)</xref>
to project ferret HI assay data to low dimensions. The influenza viral surface glycoprotein hemagglutinin is a primary target of the protective immune response. Here, we project the hemagglutinin protein sequence data, rather than animal model data, to low dimensions. The HA1 protein of influenza with 329 residues can be considered as a 329-dimension space. The multidimensional scaling method is applied to rescale the 329-dimension space to a two-dimensional space, so that we can plot and visualize it. First, we do a multialignment of the HA1 proteins. Then, the distance between any two proteins is calculated as
<disp-formula id="GZQ078M1">
<label>(1)</label>
<graphic xlink:href="gzq078eq1"></graphic>
</disp-formula>
where
<inline-formula>
<inline-graphic xlink:href="gzq078ileq1.jpg"></inline-graphic>
</inline-formula>
is the amino acid of protein
<italic>i</italic>
at position
<italic>m</italic>
. The term
<inline-formula>
<inline-graphic xlink:href="gzq078ileq2.jpg"></inline-graphic>
</inline-formula>
is 1 if amino acids of protein
<inline-formula>
<inline-graphic xlink:href="gzq078ileq3.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="gzq078ileq4.jpg"></inline-graphic>
</inline-formula>
at position
<italic>m</italic>
are the same. Otherwise, it is 0. For the 2009 H1N1 viruses, we consider the entire HA protein, and
<italic>N</italic>
= 566. For H3N2 viruses, we consider only the HA1 protein, and
<italic>N</italic>
= 329, because the entire HA proteins are not completely sequenced in many cases. Thus,
<inline-formula>
<inline-graphic xlink:href="gzq078ileq5.jpg"></inline-graphic>
</inline-formula>
is the number of amino acid differences between HA proteins normalized by length. The multidimensional scaling produces a protein distance map, for example, Fig. 
<xref ref-type="fig" rid="GZQ078F2">2</xref>
b. In this map, each data point represents a flu strain isolate. The Euclidean distance between two points in the map approximates the protein distance in Equation (1) between these two flu strains (see Materials and methods for details of this distance approximation procedure). Two closely located points imply two strains with similar HA protein sequences.
<fig id="GZQ078F2" position="float">
<label>Fig. 2</label>
<caption>
<p>(
<bold>a</bold>
) Kernel density estimation for the protein distance map of 2009 A(H1N1) influenza as of 5 December 2009. (
<bold>b</bold>
) The protein distance map of 2009 A(H1N1) influenza. The vertical and horizontal axes of both figures represent protein distance as defined in Equation (1). A 0.0018 unit of protein distance equals one substitution in the HA protein sequence of H1N1. The height and colors in (a) both represent the density of isolates.</p>
</caption>
<graphic xlink:href="gzq07802"></graphic>
</fig>
</p>
<p>We apply the low-dimensional clustering method to study 2009 A(H1N1). We plot the protein distance map in Fig. 
<xref ref-type="fig" rid="GZQ078F2">2</xref>
b. Both A/Texas/05/2009 and A/New York/19/2009 are located near the center of the cluster, in good agreement with the observation from Fig. 
<xref ref-type="fig" rid="GZQ078F1">1</xref>
that they are the founder strains for many F1 strains. To detect the clusters in the protein distance map, we use a statistical method known as kernel density estimation (
<xref ref-type="bibr" rid="GZQ078C6">Everitt
<italic>et al.</italic>
, 2001</xref>
). Kernel density estimation is a non-parametric method to estimate the probability density function from which data come. The kernel density figure is produced from the protein distance map, and it shows the density of influenza strains in sequence space. We plot the kernel density as the three-dimensional shaded surface. For example, the kernel density surface Fig. 
<xref ref-type="fig" rid="GZQ078F2">2</xref>
a is produced from Fig. 
<xref ref-type="fig" rid="GZQ078F2">2</xref>
b. The
<italic>x</italic>
and
<italic>y</italic>
axes in Fig. 
<xref ref-type="fig" rid="GZQ078F2">2</xref>
a are the same as that in Fig. 
<xref ref-type="fig" rid="GZQ078F2">2</xref>
b and are protein distance coordinates. The
<italic>z</italic>
dimension measures the density of flu strains around point (
<italic>x, y</italic>
). We use the surface height and the colors to represent
<italic>z</italic>
values, and the color is proportional to surface height. A peak in kernel density Fig. 
<xref ref-type="fig" rid="GZQ078F2">2</xref>
a indicates a cluster of related flu strains in the protein distance map Fig. 
<xref ref-type="fig" rid="GZQ078F2">2</xref>
b.</p>
<p>There are two significant clusters in Fig. 
<xref ref-type="fig" rid="GZQ078F2">2</xref>
a, as two peaks are observed. The cluster on the left side contains A/Texas/05/2009. Another cluster on the right side contains A/New York/19/2009. The 2009 A(H1N1) virus has evolved slowly to date. The greatest
<italic>p</italic>
<sub>epitope</sub>
antigenic distance between A/Texas/05/2009 and all sequenced strains is measured to be <0.08. Values of
<italic>p</italic>
<sub>epitope</sub>
less than 0.45 for H1N1 indicate positive expected vaccine efficacy (
<xref ref-type="bibr" rid="GZQ078C19">Pan
<italic>et al.</italic>
, 2009</xref>
), and so a vaccine is expected to be efficacious. All of the amino acids in all five epitopes of a strain of A/Texas/05/2009 and a strain of A/New York/19/2009 are the same. Multidimensional scaling predicts that A/Texas/05/2009 will be the dominant strain in the 2009–2010 season, and that A/Texas/05/2009 is a suitable strain for vaccination. Our focus is on the expected vaccine effectiveness, as it can be judged from antisera HI assay or sequence data alone. We do not consider other aspects such as growth in hen's eggs or other manufacturing constraints. Laboratory growth and passage data are needed to address these aspects.</p>
</sec>
<sec id="s2c">
<title>H3N2 virus evolution for 38 years</title>
<p>We construct the protein distance map to determine the evolution of influenza A(H3N2) virus from 1969 to 2007. Sequences of HA1 proteins were downloaded from the Influenza Virus Resources database (
<xref ref-type="bibr" rid="GZQ078C1">Bao
<italic>et al.</italic>
, 2008</xref>
). We use the multidimensional clustering method (
<xref ref-type="bibr" rid="GZQ078C16">Lapedes and Farber, 2001</xref>
) to generate the protein distance map and corresponding kernel density estimation in Fig. 
<xref ref-type="fig" rid="GZQ078F3">3</xref>
.
<xref ref-type="bibr" rid="GZQ078C26">Smith
<italic>et al.</italic>
(2004)</xref>
produced a similar graph using ferret antisera HI assay data. The figure presented here has a higher resolution, and more clusters are observed, because protein sequences data are more abundant and accurate than antisera HI assay data. The evolution of influenza tends to group strain into clusters. In Fig. 
<xref ref-type="fig" rid="GZQ078F3">3</xref>
, we identified 14 major clusters by setting a cutoff value of kernel density for the past 38 years from 1969 to 2007. The average duration time for a cluster is therefore 2.7 years, which is also the approximate duration of a vaccine. We marked each cluster by the first vaccine strain in the cluster. There are apparent gaps between clusters. The antigenic distance between two strains in two separate clusters is larger than the distances within the same cluster. The influenza virus evolves within one cluster before jumping from one cluster to another cluster. This dynamics occurs because small antigenic drift by one or a few sequential mutations does not lead the virus to completely escape from cross-immunity induced by vaccine protection or prior exposure.
<fig id="GZQ078F3" position="float">
<label>Fig. 3</label>
<caption>
<p>(a) The protein distance map and (b) corresponding Kernel density estimation of influenza from 1968 to 2007. The vertical and horizontal axes of both figures represent protein distance as defined in Equation (1). A 0.0030 unit of protein distance equals one substitution in the HA1 protein sequence of H3N2. The colors in (a) represent the time of collection of the isolates. The colors and height in (b) represent the density of isolates. Each cluster is named after the first vaccine strain in the cluster. HK68: Hongkong/1/68, EN72: England/42/72, VT75: Victoria/3/75, TX77: Texas/1/77, BK79: Bangkok/1/79, PP82: Philippines/2/82, SC87: Sichuan/2/87, BJ89: Beijing/32/92, SD93: Shandong/9/93, JB94: Johannesburg/33/94, WH95: Wuhan359/95, SN97: Sydney/5/97, PM99: Panama/2007/99, FJ02: Fujian/411/2002.</p>
</caption>
<graphic xlink:href="gzq07803"></graphic>
</fig>
</p>
<p>For vaccine design, when the viruses evolve as a quasispecies in the same cluster, the vaccine that is targeted to the cluster provides protection. This protection decreases with antigenic distance. When the viruses jump to a new cluster by antigenic drift or shift, one would want to update the vaccine to provide protection against strains in the new cluster. In Fig. 
<xref ref-type="fig" rid="GZQ078F3">3</xref>
a, the arrows point to the exact position of vaccine strains. It can be seen that the positions of vaccine strains are near the center of clusters. It can be shown mathematically that choosing the consensus strain of a cluster as vaccine strain minimizes the
<italic>p</italic>
<sub>epitope</sub>
antigenic distance between vaccine strain and cluster strains, and thus maximizes expected vaccine efficacy (
<xref ref-type="bibr" rid="GZQ078C14">Gupta
<italic>et al.</italic>
, 2006</xref>
).</p>
</sec>
<sec id="s2d">
<title>Influenza vaccine strain selection</title>
<p>We now use the low-dimensional sequence clustering method in an effort to detect a new flu strain before it becomes dominant. A question of interest in influenza research is whether we can predict which strain will be dominant in the next flu season based on the information we have at present. WHO gathers together every February to make a recommendation for influenza strains to be used in vaccine for next flu season in the Northern hemisphere. The vaccine is expected to have high efficacy if the chosen strain is dominant in the next flu season. The recommendation is especially challenging to make when the dominant strain in next flu season has not been dominant before February of that year. For example, in mid-March 2009, a new H3N2 strain appeared (
<xref ref-type="bibr" rid="GZQ078C24">Seasonal influenza, 2009</xref>
;
<xref ref-type="bibr" rid="GZQ078C25">Skowronski, 2009</xref>
), which infected a significant fraction of the population in the Southern hemisphere.</p>
<p>The current accepted influenza vaccine strain selection procedure is as follows (
<xref ref-type="bibr" rid="GZQ078C21">Russell
<italic>et al.</italic>
, 2008a</xref>
). Isolates samples are collected by WHO GISN and are characterized antigenically using the HI assay. About 10% of samples are also sequenced in HA1 domain of HA gene. Antigenic maps are constructed from the HI assay data using the dimensional projection technique. Examination of HI data is not dependent on analysis using dimensional projection, but rather, the primary HI data may carry the most weight. If the vaccine does not match the current circulating strains, the vaccine is updated to contain one representative of the circulating strains. The emerging variant strains are identified. If the antigenically distinct emerging variants are judged to be the dominant strains in the upcoming season, the vaccine is updated to include one representative of emerging variants. The key issue and major difficulty is how to judge whether emerging variants will be the dominant variants in the next season. If a fourfold difference in antisera HI titer between the vaccine strain and the emerging strains is observed, the emerging strain is to be determined to be dominant strains in the upcoming season, and an updated vaccine is recommended to include the emerging strains (
<xref ref-type="bibr" rid="GZQ078C21">Russell
<italic>et al.</italic>
, 2008a</xref>
).</p>
<p>Here, we propose a modified vaccine selection process based on clustering detection. First, we apply the multidimensional scaling to make a protein distance map from HA1 sequences, instead of constructing an antigenic map from HI assay data. Then, we use kernel density estimation to determine the clusters of strains. If the vaccine does not match the current circulating cluster, the vaccine is updated to contain the current circulating strain. If the vaccine matches the current circulating cluster, but an emerging cluster is judged likely to be the major cluster in the upcoming season, the vaccine is updated to contain the consensus strain of the emerging cluster. We judge whether a cluster is an emerging dominant cluster by two criteria. The first criterion is that this cluster can be detected by kernel density estimation, and is separate from the cluster that contains the current circulating strain or vaccine strain. A cluster that can be detected by kernel density estimation usually contains a central strain that has multiple identical copies and some F1 strains that are closely related to the central strain. An example is the cluster of A/Texas/05/2009(H1N1) in Fig. 
<xref ref-type="fig" rid="GZQ078F1">1</xref>
. A/Texas/05/2009(H1N1) is the central strain, which has 120 strains with identical HA protein sequences in the Influenza Virus Resource database (
<xref ref-type="bibr" rid="GZQ078C1">Bao
<italic>et al.</italic>
, 2008</xref>
). A/Texas/05/2009(H1N1) also has 29 F1 strains with one amino acid different. So, A/Texas/05/2009(H1N1) and the surrounding strains form a cluster as we detected in Fig. 
<xref ref-type="fig" rid="GZQ078F2">2</xref>
by kernel density estimation.</p>
<p>The second criterion is that the current vaccine strain does not match the consensus strain of the cluster and is estimated to provide low protection against strains in the cluster. That is, is the new strain sufficiently different so that an immune response stimulated by the current vaccine is not expected to be effective. The consensus strain is a protein sequence that shows which residues are most abundant in the multialignment at each position. The efficacy of current vaccine to the new cluster can be estimated from ferret antisera HI assay data. However, the antisera data have low resolution and has an imperfect correlation to vaccine effectiveness in humans (
<xref ref-type="bibr" rid="GZQ078C14">Gupta
<italic>et al.</italic>
, 2006</xref>
;
<xref ref-type="bibr" rid="GZQ078C50">Zhou
<italic>et al.</italic>
, 2010</xref>
). Instead, we use
<italic>p</italic>
<sub>epitope</sub>
, which is calculated as the fraction of mutations in dominant epitope, to estimate vaccine efficacy and which has a more robust correlation to vaccine effectiveness in human than do ferret HI data (
<xref ref-type="bibr" rid="GZQ078C14">Gupta
<italic>et al.</italic>
, 2006</xref>
). When the
<italic>p</italic>
<sub>epitope</sub>
between the current vaccine strain and consensus strain of the new cluster is larger than 0.19, expected vaccine efficacy decreases to 0 for H3N2 influenza, and the current vaccine cannot be expected to provide protection from new strains. As the examples shown below, our method can detect an incipient dominant strain at its very early stage, and the method appears to require about 10 sequences in the new cluster for detection.</p>
</sec>
<sec id="s2e">
<title>Demonstration of low-dimensional sequence clustering method</title>
<p>We demonstrate the method of detecting the A/Fujian/411/2002(H3N2) strain. The A/Panama/2007/1999 had been the vaccination strain for four flu seasons between 2000 and 2004 in the Northern hemisphere.</p>
<p>The vaccine strain was replaced by A/Fujian/411/2002(H3N2) in the 2004–2005 flu season, as described in Table 
<xref ref-type="table" rid="GZQ078TB1">I</xref>
. The vaccine strain in the 2003–2004 season was A/Panama/2007/1999, while the dominant circulating strain became A/Fujian/411/2002(H3N2). This mismatch resulted in a large decrease in vaccine efficacy in the 2003–2004 flu season (
<xref ref-type="bibr" rid="GZQ078C14">Gupta
<italic>et al.</italic>
, 2006</xref>
). The vaccine efficacy is estimated to be only 12% (
<xref ref-type="bibr" rid="GZQ078C51">MMWR Morb, 2004</xref>
). We test whether our method can detect A/Fujian/411/2002(H3N2) as an incipient dominant strain before it actually became dominant. We use only virus sequence data before 1 October 2003. We did not use any virus data collected in the 2003–2004 season. Therefore, our prediction and results are made without any knowledge from what happened in the 2003–2004 season. We plot the protein distance map of the 2001–2002 flu season in Fig. 
<xref ref-type="fig" rid="GZQ078F4">4</xref>
d. To detect the clusters, we plot the kernel density in Fig. 
<xref ref-type="fig" rid="GZQ078F4">4</xref>
b for the data in Fig. 
<xref ref-type="fig" rid="GZQ078F4">4</xref>
d. There are two separate significant clusters. The one with the largest kernel density on the left contains the current dominant strain A/Panama/2007/1999 and the widespread A/Moscow/10/1999 strain. The smaller one on the right is a new cluster, which contains A/Fujian/411/2002. Using the data as of 30 September 2002, we seek to determine whether the new cluster on the right in Fig. 
<xref ref-type="fig" rid="GZQ078F4">4</xref>
b and d will be the next dominant strain after A/Panama/2007/1999. We determine whether this cluster fulfills the two criteria above. First, this new cluster can be significantly detected by kernel density estimation. This cluster is separate from the current dominant strain, as we can see in the figure. Second, we calculated the average
<italic>p</italic>
<sub>epitope</sub>
of the new cluster on the right with regard to A/Moscow/10/1999, A/Panama/2007/1999 and A/Fujian/411/2002 to be 0.214, 0.1214 and 0.083, respectively. This means the current vaccine contains A/Moscow/10/1999 is expected to provide little protection against viruses in the new cluster. This result makes the new cluster fulfill the second criterion. Thus, we predict based on the data as of 30 September 2002, that the cluster on the right in Fig. 
<xref ref-type="fig" rid="GZQ078F4">4</xref>
d will be the next dominant cluster. This prediction was made on data collected 1 year earlier than when the A/Fujian/411/2002 became dominant in the 2003–2004 season. To further support our prediction, in Fig. 
<xref ref-type="fig" rid="GZQ078F4">4</xref>
c, we plot the protein distance map from 1 October 2002 to 1 February 2003, right before the WHO selected the vaccine strain for the 2003–2004 season. To detect the clusters, we plot the kernel in Fig. 
<xref ref-type="fig" rid="GZQ078F4">4</xref>
a for the data in Fig. 
<xref ref-type="fig" rid="GZQ078F4">4</xref>
c. There are two separate major clusters observed in the kernel density estimation in Fig. 
<xref ref-type="fig" rid="GZQ078F4">4</xref>
a. The left cluster has the current dominant strain of A/Panama/2007/1999 and also A/Moscow/10/1999. The right cluster has the A/Fujian/411/2002. We calculated the average
<italic>p</italic>
<sub>epitope</sub>
of the right new cluster with regard to A/Moscow/10/1999, A/Panama/2007/1999 and A/Fujian/411/2002 to be 0.2725, 0.1811 and 0.0367, respectively. This result further supports the prediction that the new cluster will become dominant, and A/Fujian/411/2002, which is the most frequent strain in the new cluster, will be or is very close to the next dominant strain. This suggestion proceeds the vaccine component switch by 1–2 years, as shown in Table 
<xref ref-type="table" rid="GZQ078TB1">I</xref>
.
<table-wrap id="GZQ078TB1" position="float">
<label>Table I.</label>
<caption>
<p>Summary of results</p>
</caption>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="left" span="1"></col>
<col align="left" span="1"></col>
<col align="left" span="1"></col>
<col align="left" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Flu season</th>
<th align="left" rowspan="1" colspan="1">Vaccine strain from WHO (
<xref ref-type="bibr" rid="GZQ078C47">World Health Organization, 2009c</xref>
)</th>
<th align="left" rowspan="1" colspan="1">Our prediction</th>
<th align="left" rowspan="1" colspan="1">Circulating H3N2 strain</th>
<th align="left" rowspan="1" colspan="1">Circulating subtype</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="1" colspan="1">1996–1997</td>
<td rowspan="1" colspan="1">Wuhan/359/95</td>
<td rowspan="1" colspan="1">Wuhan/359/95</td>
<td rowspan="1" colspan="1">Wuhan/359/95</td>
<td rowspan="1" colspan="1">H3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">1997–1998</td>
<td rowspan="1" colspan="1">Wuhan/359/95</td>
<td rowspan="1" colspan="1">Wuhan/359/95</td>
<td rowspan="1" colspan="1">Sydney/5/97</td>
<td rowspan="1" colspan="1">H3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">1998–1999</td>
<td rowspan="1" colspan="1">Sydney/5/97</td>
<td rowspan="1" colspan="1">Sydney/5/97</td>
<td rowspan="1" colspan="1">Sydney/5/97</td>
<td rowspan="1" colspan="1">H3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">1999–2000</td>
<td rowspan="1" colspan="1">Sydney/5/97</td>
<td rowspan="1" colspan="1">Sydney/5/97</td>
<td rowspan="1" colspan="1">Sydney/5/97</td>
<td rowspan="1" colspan="1">H3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2000–2001</td>
<td rowspan="1" colspan="1">Panama/2007/1999</td>
<td rowspan="1" colspan="1">Panama/2007/1999</td>
<td rowspan="1" colspan="1">N/A</td>
<td rowspan="1" colspan="1">H1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2001–2002</td>
<td rowspan="1" colspan="1">Panama/2007/1999</td>
<td rowspan="1" colspan="1">Panama/2007/1999</td>
<td rowspan="1" colspan="1">Panama/2007/1999</td>
<td rowspan="1" colspan="1">H3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2002–2003</td>
<td rowspan="1" colspan="1">Panama/2007/1999</td>
<td rowspan="1" colspan="1">Fujian/411/2002</td>
<td rowspan="1" colspan="1">N/A</td>
<td rowspan="1" colspan="1">H1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2003–2004</td>
<td rowspan="1" colspan="1">Panama/2007/1999</td>
<td rowspan="1" colspan="1">Fujian/411/2002</td>
<td rowspan="1" colspan="1">Fujian/411/2002</td>
<td rowspan="1" colspan="1">H3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2004–2005</td>
<td rowspan="1" colspan="1">Fujian/411/2002</td>
<td rowspan="1" colspan="1">Fujian/411/2002</td>
<td rowspan="1" colspan="1">Fujian/411/2002</td>
<td rowspan="1" colspan="1">H3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2005–2006</td>
<td rowspan="1" colspan="1">California/7/2004</td>
<td rowspan="1" colspan="1">California/7/2004</td>
<td rowspan="1" colspan="1">California/7/2004</td>
<td rowspan="1" colspan="1">H3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2006–2007</td>
<td rowspan="1" colspan="1">Wisconsin/67/2005</td>
<td rowspan="1" colspan="1">Wisconsin/67/2005</td>
<td rowspan="1" colspan="1">Wisconsin/67/2005</td>
<td rowspan="1" colspan="1">H3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2007–2008</td>
<td rowspan="1" colspan="1">Wisconsin/67/2005</td>
<td rowspan="1" colspan="1">Wisconsin/67/2005</td>
<td rowspan="1" colspan="1">N/A</td>
<td rowspan="1" colspan="1">H1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2008–2009</td>
<td rowspan="1" colspan="1">Brisbane/10/2007</td>
<td rowspan="1" colspan="1">Brisbane/10/2007</td>
<td rowspan="1" colspan="1">Brisbane/10/2007</td>
<td rowspan="1" colspan="1">H3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2009–2010</td>
<td rowspan="1" colspan="1">Brisbane/10/2007</td>
<td rowspan="1" colspan="1">BritishColumbia/RV1222/09</td>
<td rowspan="1" colspan="1">BritishColumbia/RV1222/09</td>
<td rowspan="1" colspan="1">H1</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2010–2011</td>
<td rowspan="1" colspan="1">Perth/16/2009</td>
<td rowspan="1" colspan="1">BritishColumbia/RV1222/09</td>
<td rowspan="1" colspan="1">N/A</td>
<td rowspan="1" colspan="1">N/A</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>This table includes the H3N2 vaccine strains, our prediction of dominant strains, the reported dominant circulating H3N2 strains (
<xref ref-type="bibr" rid="GZQ078C30">World Health Organization, 1995</xref>
,
<xref ref-type="bibr" rid="GZQ078C31">1996</xref>
,
<xref ref-type="bibr" rid="GZQ078C32">1997</xref>
,
<xref ref-type="bibr" rid="GZQ078C33">1998</xref>
,
<xref ref-type="bibr" rid="GZQ078C34">1999</xref>
,
<xref ref-type="bibr" rid="GZQ078C35">2000</xref>
,
<xref ref-type="bibr" rid="GZQ078C36">2001</xref>
,
<xref ref-type="bibr" rid="GZQ078C37">2002</xref>
,
<xref ref-type="bibr" rid="GZQ078C38">2003</xref>
,
<xref ref-type="bibr" rid="GZQ078C39">2004</xref>
,
<xref ref-type="bibr" rid="GZQ078C40">2005a</xref>
,
<xref ref-type="bibr" rid="GZQ078C42">2006</xref>
,
<xref ref-type="bibr" rid="GZQ078C43">2007</xref>
,
<xref ref-type="bibr" rid="GZQ078C44">2008</xref>
,
<xref ref-type="bibr" rid="GZQ078C45">2009a</xref>
,
<xref ref-type="bibr" rid="GZQ078C49">2010</xref>
) and the circulating subtypes in the northern hemisphere (
<xref ref-type="bibr" rid="GZQ078C30">World Health Organization, 1995</xref>
,
<xref ref-type="bibr" rid="GZQ078C31">1996</xref>
,
<xref ref-type="bibr" rid="GZQ078C32">1997</xref>
,
<xref ref-type="bibr" rid="GZQ078C33">1998</xref>
,
<xref ref-type="bibr" rid="GZQ078C34">1999</xref>
,
<xref ref-type="bibr" rid="GZQ078C35">2000</xref>
,
<xref ref-type="bibr" rid="GZQ078C36">2001</xref>
,
<xref ref-type="bibr" rid="GZQ078C37">2002</xref>
,
<xref ref-type="bibr" rid="GZQ078C38">2003</xref>
,
<xref ref-type="bibr" rid="GZQ078C39">2004</xref>
,
<xref ref-type="bibr" rid="GZQ078C40">2005a</xref>
,
<xref ref-type="bibr" rid="GZQ078C42">2006</xref>
,
<xref ref-type="bibr" rid="GZQ078C43">2007</xref>
,
<xref ref-type="bibr" rid="GZQ078C44">2008</xref>
,
<xref ref-type="bibr" rid="GZQ078C45">2009a</xref>
,
<xref ref-type="bibr" rid="GZQ078C49">2010</xref>
). Circulating H3N2 strains are absent if the dominant subtype is H1 or influenza B. The reported dominant H3N2 strains and circulating subtypes data are from WHO Weekly Epidemiological Record (
<uri xlink:type="simple" xlink:href="http://www.who.int/wer/en/">http://www.who.int/wer/en/</uri>
).</p>
</fn>
</table-wrap-foot>
</table-wrap>
<fig id="GZQ078F4" position="float">
<label>Fig. 4</label>
<caption>
<p>(
<bold>a</bold>
) Kernel density estimation and (
<bold>c</bold>
) protein distance map for H3N2 viruses between 1 October 2002 and 1 February 2003. (
<bold>b</bold>
) Kernel density estimation and (
<bold>d</bold>
) protein distance map for H3N2 viruses between 1 October 2001 and 9 September 2002. We plot a dotted line to separate the two clusters. The vertical and horizontal axes of all figures represent protein distance as defined in Equation (1). A 0.0030 unit of protein distance equals one substitution of the HA1 protein sequence of H3N2.</p>
</caption>
<graphic xlink:href="gzq07804"></graphic>
</fig>
</p>
</sec>
<sec id="s2f">
<title>Prediction for H3N2 influenza in 2009–2010</title>
<p>By applying our method to the 2008–2009 flu season, we predict that the dominant H3N2 strain in the 2009–2010 flu season may switch. Based on the flu activity in the 2008–2009 flu season, the WHO made the recommendation in February 2009 that A/Brisbane/10/2007(H3N2) should be used as the vaccine (
<xref ref-type="bibr" rid="GZQ078C47">World Health Organization, 2009c</xref>
). However, a new strain evolved just after the recommendation was published. The British Columbia Center for Disease Control detected a new virus strain (
<xref ref-type="bibr" rid="GZQ078C24">Seasonal influenza, 2009</xref>
;
<xref ref-type="bibr" rid="GZQ078C25">Skowronski, 2009</xref>
) with 3 mutations in antigenic sites (two in epitope B and one in epitope D). Since this new strain is relatively far from the vaccine strain, with
<italic>p</italic>
<sub>epitope</sub>
= 0.095, vaccine efficacy is expected to decrease to 20% (
<xref ref-type="bibr" rid="GZQ078C14">Gupta
<italic>et al.</italic>
, 2006</xref>
;
<xref ref-type="bibr" rid="GZQ078C3">Deem and Pan, 2009</xref>
). However, since the mutations in this new strain ‘do not fulfill the criteria proposed by Cox as corresponding to meaningful antigenic drift’ (
<xref ref-type="bibr" rid="GZQ078C2">Cox and Bender, 1995</xref>
;
<xref ref-type="bibr" rid="GZQ078C25">Skowronski, 2009</xref>
), and this strain still remained the minority of H3N2 viruses in July 2009, health authorities were not certain that this new strain would replace the current dominant strain in the 2009–2010 flu season. We use our method to investigate whether this new strain will be the next dominant strain. We construct the protein distance map as shown in Fig. 
<xref ref-type="fig" rid="GZQ078F5">5</xref>
(c). We plot the kernel density estimation in Fig. 
<xref ref-type="fig" rid="GZQ078F5">5</xref>
(a) for data in Fig. 
<xref ref-type="fig" rid="GZQ078F5">5</xref>
(c). By the data up to June 14, 2009, we see two major clusters in Fig. 
<xref ref-type="fig" rid="GZQ078F5">5</xref>
(a). The larger one on the right contains the current dominant strain A/Brisbane/10/2007, and the left one is a new cluster which contains A/British Columbia/RV1222/2009. It is apparent that this new cluster is separate from the current dominant cluster. Thus, this cluster fulfills the first criterion. We calculated the average of
<italic>p</italic>
<sub>epitope</sub>
of strains in the left new cluster with regards to A/Brisbane/10/2007 and A/British Columbia/RV1222/2009 to be 0.103 and 0.042 respectively. The vaccine that contains A/Brisbane/10/2007 has an expected efficacy of 20% to the virus strains in the new cluster. Thus, this new cluster satisfies both two criteria, and so we predict that this cluster which contains A/British Columbia/RV1222/2009 will be the dominant cluster in the 2009–2010 season. The earliest time for us to make this prediction is 30 March 2009. In Fig. 
<xref ref-type="fig" rid="GZQ078F5">5</xref>
d and b, we already see this new cluster on the left side of figure, though since there are only about 10 sequences in the new cluster, the kernel density of this new cluster is smaller than that in the dominant cluster. This strain was mentioned as a concern on 5 May 2009, although by conventional methods the strain was not considered a potentially new dominant strain in July 2009 (
<xref ref-type="bibr" rid="GZQ078C25">Skowronski, 2009</xref>
). With the method of the present paper, this new cluster is suggested earlier using the data as of 30 March 2009.
<fig id="GZQ078F5" position="float">
<label>Fig. 5</label>
<caption>
<p>(
<bold>a</bold>
) Kernel density estimation and (
<bold>c</bold>
) protein distance map for H3N2 viruses from 1 October 2008 to 14 June 2009. (
<bold>b</bold>
) Kernel density estimation and (
<bold>d</bold>
) protein distance map for H3N2 viruses between 1 October 2008 and 30 March 2009. The vertical and horizontal axes of all figures represent protein distance as defined in Equation (1). A 0.0030 unit of protein distance equals one substitution of the HA1 protein sequence of H3N2.</p>
</caption>
<graphic xlink:href="gzq07805"></graphic>
</fig>
</p>
</sec>
<sec id="s2g">
<title>Comparison with previous results</title>
<p>Here we present a historical test of the method. For each flu season in the North Hemisphere from 1996, we use only the H3N2 sequences data until 1 February, before WHO published the recommendation for vaccine. We use the low-dimensional clustering to make the prediction for the dominant strain. The conventional method as used by WHO is phylogenetic analysis combined with ferret antisera HI assay. In Table 
<xref ref-type="table" rid="GZQ078TB1">I</xref>
, we compare the method with the conventional method. In the most recent 14 flu seasons, influenza subtype H3 was dominant in 10. The WHO H3N2 vaccine component matches the circulating strains in eight seasons. Our predictions match the circulating strains in nine seasons. In the 1997–1998 season, a novel flu strain Sydney/5/97 was found in June 1997. Because no similar strains were collected before 1 February, neither of the two methods can predict it. In the 2003–2004 season, our method predicts Fujian/441/2002 as the dominant strain, while phylogenetic analysis combined with ferret antisera HI assay did not. For all other eight seasons dominated by influenza subtype H3, the predictions of both methods matched the dominant circulating strain. The 2009–2010 influenza season was dominated by H1N1. But data from local outbreaks of H3N2 infections (
<xref ref-type="bibr" rid="GZQ078C24">Seasonal influenza, 2009</xref>
;
<xref ref-type="bibr" rid="GZQ078C25">Skowronski, 2009</xref>
) showed that the dominant H3N2 strain was A/British Columbia/RV1222/2009, as predicted in Table 
<xref ref-type="table" rid="GZQ078TB1">I</xref>
, rather than the vaccine strain A/Brisbane/10/2007. For the 2010–2011 season, we recommend A/British Columbia/RV1222/2009 as a vaccine strain, and the WHO recommended A/Perth/16/2009. These two strains are in the same cluster and antigenically similar with a small
<italic>p</italic>
<sub>epitope</sub>
= 0.048. Although these two strains are slightly different, the vaccine is expected to be effective.</p>
</sec>
<sec id="s2h">
<title>Detecting A/Wellington/1/2004 in the 2004 flu season in the Southern hemisphere</title>
<p>The low-dimensional clustering can also be applied to influenza in the Southern hemisphere. As an example, we test our method on the 2004 flu season. The recommended H3N2 vaccine strain by WHO used in the 2004 flu season in the Southern hemisphere was A/Fujian/411/2002. Data from the surveillance network suggested that the circulating dominant flu strain in the 2004 season in Southern hemisphere was A/Fujian/411/2002, and a late surge of A/Wellington/1/2004 was also observed. For example, in Argentina, a study showed that about 50% of infections were closely related to A/Fujian/411/2002 and another 50% were closely related to A/Wellington/1/2004 (
<xref ref-type="bibr" rid="GZQ078C23">Santamaria
<italic>et al.</italic>
, 2008</xref>
). In New Zealand, the dominant flu strain was A/Fujian/411/2002 which caused 78% of flu infections (
<xref ref-type="bibr" rid="GZQ078C52">Virology, 2004</xref>
), and a late season surge of A/Wellington/1/2004 was also reported (
<xref ref-type="bibr" rid="GZQ078C18">Northern hemisphere, 2004</xref>
). Therefore, the vaccine recommended by WHO matches the dominant strain and would be expected to have vaccine efficacy in the 2004 season in Southern hemisphere.</p>
<p>We here use the low-dimensional clustering method to detect the A/Wellington/1/2004 strain, which is not the major dominant strain but caused significant infections in the 2004 flu season. We plot the protein distance and kernel density estimation for the H3N2 viruses in Fig. 
<xref ref-type="fig" rid="GZQ078F6">6</xref>
d and b. We use the data only as of 1 February 2004, 3 months prior to the 2004 flu Southern hemisphere season, which is usually from May to September. We observed two clusters. The major cluster on the left side of Fig. 
<xref ref-type="fig" rid="GZQ078F6">6</xref>
d is A/Fujian/411/2002-like, which was the vaccine strain in the 2004 season. There is a new cluster in the right side of Fig. 
<xref ref-type="fig" rid="GZQ078F6">6</xref>
d which contains A/Wellington/1/2004. The
<italic>p</italic>
<sub>epitope</sub>
of A/Wellington/1/2004 with regards to A/Fujian/411/2002 is 0.118. Therefore, we predict that A/Wellington/1/2004 will infect a large fraction of the population, and the A/Fujian/411/2002 vaccine is expected to provide only partial protection against the A/Wellington/1/2004 virus. However, since the appearance of A/Wellington/1/2004 was just before the 2004 flu season, it did not have sufficient time to spread out and become the dominant strain in the 2004 flu season. From our observation, it usually takes about 8 months or longer for a new strain to become dominant after its appearance in a new cluster. Therefore, the predominant flu strain in the 2004 season is expected to be A/Fujian/411/2002 based on the data as of 1 February 2004. This result agrees with the dominant flu strain in the 2004 flu season.
<fig id="GZQ078F6" position="float">
<label>Fig. 6</label>
<caption>
<p>(
<bold>a</bold>
) Kernel density estimation for the protein distance map for H3N2 viruses between 1 October 2003 and 30 September 2004. (
<bold>b</bold>
) Kernel density estimation for the protein distance map for H3N2 viruses between 1 October 2003 and 1 February 2004. (
<bold>c</bold>
) Protein distance map for H3N2 viruses between 1 October 2003 and 30 September 2004. We plot a dotted line to separate the two clusters. (
<bold>d</bold>
) Protein distance map for H3N2 viruses between 1 October 2003 and 1 February 2004. The vertical and horizontal axes of all figures represent protein distance. A 0.0030 unit of protein distance equals one mutation of the HA1 protein sequence of H3N2.</p>
</caption>
<graphic xlink:href="gzq07806"></graphic>
</fig>
</p>
</sec>
<sec id="s2i">
<title>Detecting A/California/4/2004 as a future dominant strain</title>
<p>As a further example of applying the low-dimensional clustering method to influenza in Southern hemisphere, we test the method on the 2005 flu season. The recommended H3N2 vaccine strain in the 2005 flu season in the Southern hemisphere was A/Wellington/1/2004. Data from HI assay tests and surveillance suggest that the dominant H3N2 strain in the 2005 season was A/California/7/2004. In HI tests with postinfection ferret sera, the majority of influenza A(H3N2) viruses from February 2005 to October 2005 were closely related to A/California/7/2004, as reported by WHO on 7 October 2005 (
<xref ref-type="bibr" rid="GZQ078C41">World Health Organization, 2005</xref>
b). Surveillance data from Victoria, Australia, show that 45% of influenza A infections were A/California/7/2004-like(H3), 11% were A/Wellington/1/2004 (H3) and 44% were A/New Caledonia /20/99-like (H1), as collected in the 2005 flu season (
<xref ref-type="bibr" rid="GZQ078C28">Turner
<italic>et al.</italic>
, 2006</xref>
). Surveillance data from New Zealand also show that the dominant H3N2 strain in the 2005 flu season was A/California/7/2004 (
<xref ref-type="bibr" rid="GZQ078C53">Influenza Weekly, 2005</xref>
).</p>
<p>We plot the protein distance for the H3N2 viruses in the 2003–2004 flu season in Fig. 
<xref ref-type="fig" rid="GZQ078F6">6</xref>
c. We only use the data as of 30 September 2004, earlier than the October 2004 date when the WHO published the influenza vaccine recommendation for Southern hemisphere. We plot the kernel density estimation in Fig. 
<xref ref-type="fig" rid="GZQ078F6">6</xref>
a for the data in Fig. 
<xref ref-type="fig" rid="GZQ078F6">6</xref>
c. There are three major clusters in Fig. 
<xref ref-type="fig" rid="GZQ078F6">6</xref>
a. The one on the left is the current dominant cluster which are mostly A/Fujian/422/2002-like viruses. There is a middle cluster centered on A/Wellington/1/2004. The one on the right contains A/California/7/2004. Both the A/California/7/2004 cluster and the A/Wellington/1/2004 cluster are antigenically novel from A/Fujian/411/2002.</p>
<p>When the protein distance map and kernel estimation as of 1 February, 2004 is plotted in Fig. 
<xref ref-type="fig" rid="GZQ078F6">6</xref>
d and b, we still see the A/Wellington/1/2004 cluster. With these data, the A/California/7/2004 cluster is no longer observed. Thus, A/California/7/2004 cluster is a newly appearing cluster and we consider it to be the emerging strain. The new cluster which contains A/California/7/2004 is separate from the current dominant cluster. We calculated the average
<italic>p</italic>
<sub>epitope</sub>
of the new cluster that contains A/California/7/2004 with regard to A/Fujian/411/2002 to be 0.112. This makes the new cluster fulfill both criteria for an incipient dominant strain cluster. So we predict based on the information as of 30 September 2004 that A/California/7/2004 will be the next dominant strain after A/Fujian/411/2002 in Southern hemisphere. We further predict from these data that A/California/7/2004 will be the dominant strain in the following flu season in the Northern hemisphere. These predictions agree with the observed dominant strain in the 2005 flu season.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s3">
<title>Discussion</title>
<p>The evolution of influenza virus is driven by cell receptor distributions, non-specific innate host defense mechanisms, cross-immunity (
<xref ref-type="bibr" rid="GZQ078C13">Gupta
<italic>et al.</italic>
, 1998</xref>
;
<xref ref-type="bibr" rid="GZQ078C8">Ferguson
<italic>et al.</italic>
, 2003</xref>
) and other contributions to viral fitness. In this paper, we focussed on HA protein evolution under antibody selection pressure. The degree to which the immunity induced by one strain protects against another strain depends on their antigenic distance (
<xref ref-type="bibr" rid="GZQ078C14">Gupta
<italic>et al.</italic>
, 2006</xref>
). Because the human immune response to viral infection is not completely cross-protective, natural selection favors amino acid variants of the HA protein that allow the virus to evade immunity, infect more hosts and proliferate. Mutant strains surround the dominant strain and group into a cluster rather than evolve in a defined direction (
<xref ref-type="bibr" rid="GZQ078C20">Plotkin
<italic>et al.</italic>
, 2002</xref>
;
<xref ref-type="bibr" rid="GZQ078C26">Smith
<italic>et al.</italic>
, 2004</xref>
). After the virus has circulated in population for one or more years, effective vaccines and cross-immunity of the population drive the evolution of influenza by mutation and reassortment. This evolution increases the immune-escape component of the fitness of new strains, and eventually causes a new epidemic. These new immune-escape strains will form a new cluster, and the old clusters will die out, thus starting a new cycle. This process of creating of new clusters is what our method detects. The low-dimensional clustering can be used not only in genetic sequences but also on distances calculated from inhibition assays of antibody and antigens, as first shown by
<xref ref-type="bibr" rid="GZQ078C16">Lapedes and Farber (2001)</xref>
and
<xref ref-type="bibr" rid="GZQ078C26">Smith
<italic>et al.</italic>
(2004)</xref>
. The inhibition assay provides an approximation of antigenic distance and is broadly used as a marker for vaccine efficacy. The inhibition assay suffers from low resolution of data, which multidimensional scaling improves, and is less able to predict the vaccine efficacy than the
<italic>p</italic>
<sub>epitope</sub>
method (
<xref ref-type="bibr" rid="GZQ078C14">Gupta
<italic>et al.</italic>
, 2006</xref>
). The genetic sequences used here are a direct description of the evolution of pathogen and the antigenic distance of influenza. To aid vaccine selection, the low-dimensional clustering on genetic sequences appears informative.</p>
<p>Challenges may arise in application of the method described here. If two or more new clusters appear in one season, additional information is needed to decide which cluster should be chosen for vaccine. Fortunately, it has been shown that the evolution of influenza is typically in one direction (
<xref ref-type="bibr" rid="GZQ078C8">Ferguson
<italic>et al.</italic>
, 2003</xref>
;
<xref ref-type="bibr" rid="GZQ078C26">Smith
<italic>et al.</italic>
, 2004</xref>
). It is rare to have two or more new clusters in the protein distance map in one season. As experience with the low-dimensional sequence clustering is gained, it may be that cluster structure will allow more precise prediction of vaccine efficacy. Despite these issues, the method described here can assist the design of vaccines, and it provides a new tool to analyze influenza viral dynamics. We did not see any false positive results in Table 
<xref ref-type="table" rid="GZQ078TB1">I</xref>
.</p>
<p>The current WHO method works quite well in many years. The method discussed here appears to offer an additional tool which may provide additional utility.</p>
</sec>
<sec sec-type="materials|methods" id="s4">
<title>Materials and methods</title>
<sec id="s4a">
<title>Data sources</title>
<p>Influenza hemagglutinin A(H3N2) sequences before 1 October 2008, and A(H1N1) sequences as of 5 December 2009, were downloaded from NCBI Influenza Virus Resources (
<xref ref-type="bibr" rid="GZQ078C1">Bao
<italic>et al.</italic>
, 2008</xref>
). All hemagglutinin sequences used in our study are filtered by removing identical sequences, Thus, all groups of identical sequences in the data set are be represented by the oldest sequence in each group. This approach reduces the number of sequences by keeping only the unique sequences in the data set. The hemagglutinin proteins of 2009 A(H1N1) used in our work are listed in
<ext-link ext-link-type="uri" xlink:href="http://peds.oxfordjournals.org/cgi/content/full/gzq078/DC1">Supplementary data, Table S3</ext-link>
. The numerical labels in Figs 
<xref ref-type="fig" rid="GZQ078F1">1</xref>
and
<xref ref-type="fig" rid="GZQ078F2">2</xref>
are the same as the labels in the first column of
<ext-link ext-link-type="uri" xlink:href="http://peds.oxfordjournals.org/cgi/content/full/gzq078/DC1">Supplementary data, Table S3</ext-link>
. Influenza A(H3N2) sequences after 1 October 2008 were downloaded from GISAID database, see
<ext-link ext-link-type="uri" xlink:href="http://peds.oxfordjournals.org/cgi/content/full/gzq078/DC1">Supplementary data, Table S6</ext-link>
. GISAID has the latest H3N2 sequence data.</p>
</sec>
<sec id="s4b">
<title>Geographical spread pattern of 2009 A(H1N1)</title>
<p>It is believe that the 2009 A(H1N1) virus was most likely originated from Mexico (
<xref ref-type="bibr" rid="GZQ078C10">Fraser
<italic>et al.</italic>
, 2009</xref>
). It first spread to the neighboring country USA and then to other countries. We display this geographical spread pattern in Fig. 
<xref ref-type="fig" rid="GZQ078F1">1</xref>
. We take the founder–F1 relationship from Fig. 
<xref ref-type="fig" rid="GZQ078F1">1</xref>
, and assume the virus spreads from location of founder to the location of F1. We consider three regions: the USA, Mexico and other countries except the USA and Mexico. Then we count the cases of spreading from one region to another region. In
<ext-link ext-link-type="uri" xlink:href="http://peds.oxfordjournals.org/cgi/content/full/gzq078/DC1">Supplementary data, Table S2</ext-link>
, we show that we observed many more paths of spreading from the USA to other countries than from other countries to the USA. The major path of spreading is from the USA to other countries. This result indicates our directional evolutionary map of Fig. 
<xref ref-type="fig" rid="GZQ078F1">1</xref>
is in good agreement with the pattern of geographical spread.</p>
</sec>
<sec id="s4c">
<title>Multidimensional scaling</title>
<p>The goal of multidimensional scaling is to represent the distance of proteins by a Euclidean distance in coordinate space. We calculate the distance between proteins
<inline-formula>
<inline-graphic xlink:href="gzq078ileq6.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="gzq078ileq7.jpg"></inline-graphic>
</inline-formula>
,
<inline-formula>
<inline-graphic xlink:href="gzq078ileq8.jpg"></inline-graphic>
</inline-formula>
, by the number of amino acid residue differences divided by the total number of amino acid residues, as defined by Equation (1) in the main text. To do multidimensional scaling, we start with the distance of the proteins. The object of multidimensional scaling is to find the two, or
<italic>p</italic>
in general, directions that best preserve the distances
<inline-formula>
<inline-graphic xlink:href="gzq078ileq9.jpg"></inline-graphic>
</inline-formula>
between the
<italic>N</italic>
proteins
<disp-formula id="GZQ078M2">
<label>(2)</label>
<graphic xlink:href="gzq078eq2"></graphic>
</disp-formula>
Here,
<inline-formula>
<inline-graphic xlink:href="gzq078ileq10.jpg"></inline-graphic>
</inline-formula>
is the Euclidean distance between proteins
<inline-formula>
<inline-graphic xlink:href="gzq078ileq11.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="gzq078ileq12.jpg"></inline-graphic>
</inline-formula>
in the projected space, and
<inline-formula>
<inline-graphic xlink:href="gzq078ileq13.jpg"></inline-graphic>
</inline-formula>
is the vector norm. The algorithm is as follows. Let the matrix
<inline-formula>
<inline-graphic xlink:href="gzq078ileq14.jpg"></inline-graphic>
</inline-formula>
, where
<inline-formula>
<inline-graphic xlink:href="gzq078ileq15.jpg"></inline-graphic>
</inline-formula>
. The eigenvalues of
<italic>A</italic>
are
<inline-formula>
<inline-graphic xlink:href="gzq078ileq16.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="gzq078ileq17.jpg"></inline-graphic>
</inline-formula>
. Let
<inline-formula>
<inline-graphic xlink:href="gzq078ileq18.jpg"></inline-graphic>
</inline-formula>
be the eigenvector of
<inline-formula>
<inline-graphic xlink:href="gzq078ileq19.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="gzq078ileq20.jpg"></inline-graphic>
</inline-formula>
be the eigenvector of
<inline-formula>
<inline-graphic xlink:href="gzq078ileq21.jpg"></inline-graphic>
</inline-formula>
. Let
<inline-formula>
<inline-graphic xlink:href="gzq078ileq22.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="gzq078ileq23.jpg"></inline-graphic>
</inline-formula>
. The two coordinates in Figs 
<xref ref-type="fig" rid="GZQ078F2">2</xref>
<xref ref-type="fig" rid="GZQ078F3"></xref>
<xref ref-type="fig" rid="GZQ078F4"></xref>
<xref ref-type="fig" rid="GZQ078F5"></xref>
<xref ref-type="fig" rid="GZQ078F6">6</xref>
are
<inline-formula>
<inline-graphic xlink:href="gzq078ileq24.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="gzq078ileq25.jpg"></inline-graphic>
</inline-formula>
. The
<italic>x</italic>
-axis in the protein distance map is the largest eigenvector. We take the H3N2 2008–2009 season as an example. In Fig. 
<xref ref-type="fig" rid="GZQ078F5">5</xref>
c, we observe two clusters. One cluster is on the right side of figures with
<italic>x</italic>
value positive and another one has negative
<italic>x</italic>
values. We define the consensus sequence of a group of flu strains by taking the most frequent amino acid at each position. We calculate the consensus sequences both for the strains in the cluster on the right and on the left side of the figure. We found amino acids at four positions (76, 160, 172 and 203) are different for these two consensus H3N2 strains, see
<ext-link ext-link-type="uri" xlink:href="http://peds.oxfordjournals.org/cgi/content/full/gzq078/DC1">Supplementary data, Table S1</ext-link>
. Interestingly, the Shannon entropy calculated from all 2008–2009 season sequences at these four positions (0.43, 0.67, 0.59 are 0.50) are the largest, which means the diversity at these four position are the largest.</p>
<p>There is software available to run the multidimensional scaling. We use the Matlab function ‘CMD-SCALE’ to generate an
<inline-formula>
<inline-graphic xlink:href="gzq078ileq26.jpg"></inline-graphic>
</inline-formula>
configuration matrix
<italic>Y</italic>
. Rows of
<italic>Y</italic>
are the coordinates of
<italic>N</italic>
points in
<italic>p</italic>
-dimensional space. The ‘CMDSCALE’ also returns a vector
<italic>E</italic>
containing the sorted eigenvalues of what is often referred to as the ‘scalar product matrix,’ which, in the simplest case, is equal to
<inline-formula>
<inline-graphic xlink:href="gzq078ileq27.jpg"></inline-graphic>
</inline-formula>
. If only two or three of the largest eigenvalues
<italic>E</italic>
are much larger than others, then the matrix
<italic>D</italic>
based on the corresponding columns of
<italic>Y</italic>
nearly reproduces the original distance matrix d. We used the influenza H3N2 in the 2001–2002 season as an example. The five largest of all 180 eigenvalues are 0.0361, 0.0032, 0.0024, 0.0020 and 0.0016. The first two largest eigenvalues contribute 70% to the sum of all 180 eigenvalues, which indicates
<italic>p</italic>
= 2. Then, we plot the
<italic>N</italic>
points in a two-dimensional graph. Each point represents a protein. The Euclidean distance between any two points
<inline-formula>
<inline-graphic xlink:href="gzq078ileq28.jpg"></inline-graphic>
</inline-formula>
on the graph should be equal to or close to the distance of these two proteins. That is,
<inline-formula>
<inline-graphic xlink:href="gzq078ileq29.jpg"></inline-graphic>
</inline-formula>
. As an example, in
<ext-link ext-link-type="uri" xlink:href="http://peds.oxfordjournals.org/cgi/content/full/gzq078/DC1">Supplementary data, Fig. S1</ext-link>
, we show that
<inline-formula>
<inline-graphic xlink:href="gzq078ileq30.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="gzq078ileq31.jpg"></inline-graphic>
</inline-formula>
have a strong linear relationship. A short MATLAB program of multidimensional scaling is as follow.
<list list-type="simple">
<list-item>
<p>% Multidimensional scaling.</p>
</list-item>
<list-item>
<p>% alignment.aln is a sequence multialignment file</p>
</list-item>
<list-item>
<p>% generated by software ClustalW.</p>
</list-item>
<list-item>
<p>clear</p>
</list-item>
<list-item>
<p>Sequences = multialignread(’alignment.aln’);</p>
</list-item>
<list-item>
<p>distances = seqpdist(Sequences,’Method’,’p-distance’);</p>
</list-item>
<list-item>
<p>
<italic>Y</italic>
= cmdscale(distances);</p>
</list-item>
<list-item>
<p>scatter(
<italic>Y</italic>
(:,1),
<italic>Y</italic>
(:,2));</p>
</list-item>
</list>
</p>
</sec>
<sec id="s4d">
<title>Biases in the data</title>
<p>There are two biases in the sequence data. First, more isolates are sequenced in recent years. Generally speaking, more sequences make the vaccine selection based on low-dimensional clustering methods more reliable. That is why we compared low-dimensional clustering methods with WHO results only since 1996 in Table 
<xref ref-type="table" rid="GZQ078TB1">I</xref>
. To avoid these biases in the generation of the figure of evolution history of influenza for the 40 years (Fig. 
<xref ref-type="fig" rid="GZQ078F3">3</xref>
), we choose 20 random isolates for each season, even though the database contains more sequences in recent years. Second, most isolates are collected in the USA. We found that many isolates collected in the USA are identical, because of the high sampling rate in the USA. To reduce this bias, we collapse redundant strains, keeping only distinct strains.</p>
</sec>
</sec>
<sec id="s5">
<title>Supplementary data</title>
<p>
<ext-link ext-link-type="uri" xlink:href="http://peds.oxfordjournals.org/cgi/content/full/gzq078/DC1">Supplementary data are available at
<italic>PEDS</italic>
online</ext-link>
.</p>
</sec>
<sec id="s6">
<title>Funding</title>
<p>This research was supported by
<funding-source>DARPA</funding-source>
under the
<funding-source>FunBio program</funding-source>
.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>Supplementary Data</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="supp_23_12_935__index.html"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="supp_gzq078_gzq078supp.pdf"></media>
</supplementary-material>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="GZQ078C1">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Bolotov</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Dernovoy</surname>
<given-names>D.</given-names>
</name>
<etal></etal>
</person-group>
<source>J. Virol.</source>
<year>2008</year>
<volume>82</volume>
<fpage>596</fpage>
<lpage>601</lpage>
<pub-id pub-id-type="pmid">17942553</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C2">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cox</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Bender</surname>
<given-names>C.</given-names>
</name>
</person-group>
<source>Semin. Virol.</source>
<year>1995</year>
<volume>6</volume>
<fpage>359</fpage>
<lpage>370</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/doi:10.1016/S0168-583X(01)00748-0">doi:10.1016/S0168-583X(01)00748-0</ext-link>
</comment>
</element-citation>
</ref>
<ref id="GZQ078C3">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Deem</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pan</surname>
<given-names>K.</given-names>
</name>
</person-group>
<source>Protein Eng. Des. Sel.</source>
<year>2009</year>
<volume>22</volume>
<fpage>543</fpage>
<lpage>546</lpage>
<pub-id pub-id-type="pmid">19578121</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C4">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Domingo</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Holland</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Biebricher</surname>
<given-names>C.</given-names>
</name>
</person-group>
<source>Quasispecies and RNA Virus Evolution: Principles and consequences</source>
<year>2002</year>
<publisher-loc>Austin, TX</publisher-loc>
<publisher-name>Landes</publisher-name>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/doi:10.1016/j.radmeas.2003.12.023">doi:10.1016/j.radmeas.2003.12.023</ext-link>
</comment>
</element-citation>
</ref>
<ref id="GZQ078C5">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Drake</surname>
<given-names>J.W.</given-names>
</name>
<name>
<surname>Holland</surname>
<given-names>J.J.</given-names>
</name>
</person-group>
<source>Proc. Natl Acad. Sci. USA</source>
<year>1999</year>
<volume>96</volume>
<fpage>13910</fpage>
<lpage>13913</lpage>
<pub-id pub-id-type="pmid">10570172</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C6">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Everitt</surname>
<given-names>B.S.</given-names>
</name>
<name>
<surname>Landau</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Leese</surname>
<given-names>M.</given-names>
</name>
</person-group>
<source>Cluster Analysis</source>
<year>2001</year>
<publisher-name>Oxford University Press</publisher-name>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/doi:10.1016/S1350-4487(96)00085-6">doi:10.1016/S1350-4487(96)00085-6</ext-link>
</comment>
</element-citation>
</ref>
<ref id="GZQ078C7">
<element-citation publication-type="other">
<collab>FDA</collab>
<article-title>Regulatory considerations regarding the use of novel influenza A (H1N1) virus vaccines</article-title>
<year>2009</year>
<comment>23 July</comment>
</element-citation>
</ref>
<ref id="GZQ078C8">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ferguson</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Galvani</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bush</surname>
<given-names>R.</given-names>
</name>
</person-group>
<source>Nature</source>
<year>2003</year>
<volume>422</volume>
<fpage>428</fpage>
<pub-id pub-id-type="pmid">12660783</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C9">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fitch</surname>
<given-names>W.M.</given-names>
</name>
<name>
<surname>Bush</surname>
<given-names>R.M.</given-names>
</name>
<name>
<surname>Bender</surname>
<given-names>C.A.</given-names>
</name>
<name>
<surname>Cox</surname>
<given-names>N.J.</given-names>
</name>
</person-group>
<source>Proc. Natl Acad. Sci. USA</source>
<year>1997</year>
<volume>94</volume>
<fpage>7712</fpage>
<lpage>7718</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/doi:10.1097/00004032-197108000-00016">doi:10.1097/00004032-197108000-00016</ext-link>
</comment>
<pub-id pub-id-type="pmid">9223253</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C10">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fraser</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Donnelly</surname>
<given-names>C.A.</given-names>
</name>
<name>
<surname>Cauchemez</surname>
<given-names>S.</given-names>
</name>
<etal></etal>
</person-group>
<source>Science</source>
<year>2009</year>
<volume>324</volume>
<fpage>1557</fpage>
<lpage>1561</lpage>
<pub-id pub-id-type="pmid">19433588</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C11">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garten</surname>
<given-names>R.J.</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>C.T.</given-names>
</name>
<name>
<surname>Russell</surname>
<given-names>C.A.</given-names>
</name>
<etal></etal>
</person-group>
<source>Science</source>
<year>2009</year>
<volume>325</volume>
<fpage>197</fpage>
<lpage>201</lpage>
<pub-id pub-id-type="pmid">19465683</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C12">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ghedin</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Sengamalay</surname>
<given-names>N.A.</given-names>
</name>
<name>
<surname>Shumway</surname>
<given-names>M.</given-names>
</name>
<etal></etal>
</person-group>
<source>Nature</source>
<year>2005</year>
<volume>437</volume>
<fpage>1162</fpage>
<lpage>1166</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/doi:10.1016/j.radmeas.2005.02.013">doi:10.1016/j.radmeas.2005.02.013</ext-link>
</comment>
<pub-id pub-id-type="pmid">16208317</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C13">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gupta</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ferguson</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>R.</given-names>
</name>
</person-group>
<source>Science</source>
<year>1998</year>
<volume>280</volume>
<fpage>912</fpage>
<pub-id pub-id-type="pmid">9572737</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C14">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gupta</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Earl</surname>
<given-names>D.J.</given-names>
</name>
<name>
<surname>Deem</surname>
<given-names>M.</given-names>
</name>
</person-group>
<source>Vaccine</source>
<year>2006</year>
<volume>24</volume>
<fpage>3881</fpage>
<lpage>3888</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/doi:10.1016/j.radmeas.2008.04.015">doi:10.1016/j.radmeas.2008.04.015</ext-link>
</comment>
<pub-id pub-id-type="pmid">16460844</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C15">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hak</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Nordin</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>F.</given-names>
</name>
<etal></etal>
</person-group>
<source>Clin. Infect. Dis.</source>
<year>2002</year>
<volume>35</volume>
<fpage>370</fpage>
<lpage>377</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/doi:10.1016/j.radmeas.2009.10.089">doi:10.1016/j.radmeas.2009.10.089</ext-link>
</comment>
<pub-id pub-id-type="pmid">12145718</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C16">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lapedes</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Farber</surname>
<given-names>R.</given-names>
</name>
</person-group>
<source>J. Theor. Biol.</source>
<year>2001</year>
<volume>212</volume>
<fpage>57</fpage>
<lpage>69</lpage>
<pub-id pub-id-type="pmid">11527445</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C17">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nelson</surname>
<given-names>M.I.</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>E.C.</given-names>
</name>
</person-group>
<source>Nat. Rev. Genet.</source>
<year>2007</year>
<volume>8</volume>
<fpage>196</fpage>
<lpage>205</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/doi:10.1016/j.radmeas.2009.10.075">doi:10.1016/j.radmeas.2009.10.075</ext-link>
</comment>
<pub-id pub-id-type="pmid">17262054</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C18">
<element-citation publication-type="other">
<article-title>Northern hemisphere: Risk of A/Wellington/1/2004(H3N2)-like virus</article-title>
<comment>ProMed. 2004, October 24. Available from
<uri xlink:type="simple" xlink:href="http://www.promedmail.org">http://www.promedmail.org</uri>
archive no. 20041024.2879</comment>
</element-citation>
</ref>
<ref id="GZQ078C19">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Subieta</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Deem</surname>
<given-names>M.</given-names>
</name>
</person-group>
<source>Protein Eng. Deg. Sel.</source>
<comment>in press
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/doi:10.1016/j.radmeas.2005.01.009">doi:10.1016/j.radmeas.2005.01.009</ext-link>
</comment>
</element-citation>
</ref>
<ref id="GZQ078C20">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Plotkin</surname>
<given-names>J.B.</given-names>
</name>
<name>
<surname>Dushoff</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Levin</surname>
<given-names>S.A.</given-names>
</name>
</person-group>
<source>Proc. Natl Acad. Sci. USA</source>
<year>2002</year>
<volume>99</volume>
<fpage>6263</fpage>
<lpage>6268</lpage>
<pub-id pub-id-type="pmid">11972025</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C21">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Russell</surname>
<given-names>C.A.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>T.C.</given-names>
</name>
<name>
<surname>Barr</surname>
<given-names>I.G.</given-names>
</name>
<etal></etal>
</person-group>
<source>Vaccine</source>
<year>2008a</year>
<volume>26</volume>
<fpage>D31</fpage>
<lpage>D34</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/doi:10.1016/j.asr.2009.08.027">doi:10.1016/j.asr.2009.08.027</ext-link>
</comment>
<pub-id pub-id-type="pmid">19230156</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C22">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Russell</surname>
<given-names>C.A.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>T.C.</given-names>
</name>
<name>
<surname>Barr</surname>
<given-names>I.G.</given-names>
</name>
<etal></etal>
</person-group>
<source>Science</source>
<year>2008b</year>
<volume>320</volume>
<fpage>340</fpage>
<lpage>346</lpage>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/doi:10.1093/rpd/ncp265">doi:10.1093/rpd/ncp265</ext-link>
</comment>
<pub-id pub-id-type="pmid">18420927</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C23">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Santamaria</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Urue</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Videla</surname>
<given-names>C.</given-names>
</name>
<etal></etal>
</person-group>
<source>Influenza and Other Respiratory Viruses</source>
<year>2008</year>
<volume>2</volume>
<fpage>131</fpage>
<lpage>134</lpage>
<pub-id pub-id-type="pmid">19453464</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C24">
<element-citation publication-type="other">
<article-title>Seasonal influenza (H3N2) virus - potential vaccine mismatch</article-title>
<comment>ProMed. 2009, July 24. Available from
<uri xlink:type="simple" xlink:href="http://www.promedmail.org">http://www.promedmail.org</uri>
archive no. 20090724.2623</comment>
</element-citation>
</ref>
<ref id="GZQ078C25">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Skowronski</surname>
<given-names>D.</given-names>
</name>
</person-group>
<article-title>Influenza A (H1N1) - worldwide (11): Coincident H3N2 variation</article-title>
<comment>ProMed. 2009, 5 May. Available from
<uri xlink:type="simple" xlink:href="http://www.promedmail.org">http://www.promedmail.org</uri>
archive no. 20090505.1679</comment>
</element-citation>
</ref>
<ref id="GZQ078C26">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>D.J.</given-names>
</name>
<name>
<surname>Lapedes</surname>
<given-names>A.S.</given-names>
</name>
<name>
<surname>de Jong</surname>
<given-names>J.C.</given-names>
</name>
<etal></etal>
</person-group>
<source>Science</source>
<year>2004</year>
<volume>305</volume>
<fpage>371</fpage>
<lpage>376</lpage>
<pub-id pub-id-type="pmid">15218094</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C27">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Vijaykrishna</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bahl</surname>
<given-names>J.</given-names>
</name>
<etal></etal>
</person-group>
<source>Nature</source>
<year>2009</year>
<volume>459</volume>
<fpage>1122</fpage>
<pub-id pub-id-type="pmid">19516283</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C28">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turner</surname>
<given-names>J.L.</given-names>
</name>
<name>
<surname>Fielding</surname>
<given-names>J.E.</given-names>
</name>
<name>
<surname>Clothier</surname>
<given-names>H.J.</given-names>
</name>
<name>
<surname>Kelly</surname>
<given-names>H.A.</given-names>
</name>
</person-group>
<source>Commun. Dis. Intell.</source>
<year>2006</year>
<volume>30</volume>
<fpage>137</fpage>
<pub-id pub-id-type="pmid">16637243</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C29">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Webster</surname>
<given-names>R.G.</given-names>
</name>
</person-group>
<source>Emerging Infect. Dis.</source>
<year>1998</year>
<volume>4</volume>
<fpage>436</fpage>
<lpage>441</lpage>
<pub-id pub-id-type="pmid">9716966</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C30">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>1995</year>
<volume>70</volume>
<fpage>53</fpage>
<pub-id pub-id-type="pmid">7537515</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C31">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>1996</year>
<volume>71</volume>
<fpage>57</fpage>
<pub-id pub-id-type="pmid">8695344</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C32">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>1997</year>
<volume>72</volume>
<fpage>57</fpage>
<pub-id pub-id-type="pmid">9057482</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C33">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>1998</year>
<volume>73</volume>
<fpage>56</fpage>
<pub-id pub-id-type="pmid">9523511</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C34">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>1999</year>
<volume>74</volume>
<fpage>57</fpage>
<pub-id pub-id-type="pmid">10079754</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C35">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>2000</year>
<volume>75</volume>
<fpage>61</fpage>
</element-citation>
</ref>
<ref id="GZQ078C36">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>2001</year>
<volume>76</volume>
<fpage>57</fpage>
<pub-id pub-id-type="pmid">11236647</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C37">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>2002</year>
<volume>77</volume>
<fpage>57</fpage>
</element-citation>
</ref>
<ref id="GZQ078C38">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>2003</year>
<volume>78</volume>
<fpage>57</fpage>
</element-citation>
</ref>
<ref id="GZQ078C39">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>2004</year>
<volume>79</volume>
<fpage>85</fpage>
<pub-id pub-id-type="pmid">15038064</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C40">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>2005a</year>
<volume>80</volume>
<fpage>65</fpage>
</element-citation>
</ref>
<ref id="GZQ078C41">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec</source>
<year>2005b</year>
<volume>80</volume>
<fpage>341</fpage>
</element-citation>
</ref>
<ref id="GZQ078C42">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>2006</year>
<volume>81</volume>
<fpage>81</fpage>
<pub-id pub-id-type="pmid">16671217</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C43">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>2007</year>
<volume>82</volume>
<fpage>69</fpage>
<pub-id pub-id-type="pmid">17333570</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C44">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>2008</year>
<volume>83</volume>
<fpage>77</fpage>
<pub-id pub-id-type="pmid">18309578</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C45">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>2009a</year>
<volume>84</volume>
<fpage>65</fpage>
</element-citation>
</ref>
<ref id="GZQ078C46">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<article-title>Pandemic (H1N1) 2009 – update 64</article-title>
<year>2009b</year>
<comment>cited 2010, 27 January Available from</comment>
</element-citation>
</ref>
<ref id="GZQ078C47">
<element-citation publication-type="other">
<collab>World Health Organization</collab>
<article-title>Recommendations for influenza viruses</article-title>
<year>2009c</year>
<comment>Available from
<uri xlink:type="simple" xlink:href="http://www.who.int/csr/disease/influenza/vaccinerecommendations/en/index.html">http://www.who.int/csr/disease/influenza/vaccinerecommendations/en/index.html</uri>
</comment>
</element-citation>
</ref>
<ref id="GZQ078C48">
<element-citation publication-type="other">
<collab>World Health Organization</collab>
<article-title>World now at the start of 2009 influenza pandemic</article-title>
<year>2009d</year>
<comment>4 June Available from
<uri xlink:type="simple" xlink:href="http://www.who.int/mediacentre/news/statements/2009">http://www.who.int/mediacentre/news/statements/2009</uri>
<uri xlink:type="simple" xlink:href="/">/</uri>
</comment>
</element-citation>
</ref>
<ref id="GZQ078C49">
<element-citation publication-type="journal">
<collab>World Health Organization</collab>
<source>Wkly Epidemiol. Rec.</source>
<year>2010</year>
<volume>85</volume>
<fpage>81</fpage>
<pub-id pub-id-type="pmid">20210260</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C50">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Pophale</surname>
<given-names>R.S.</given-names>
</name>
<name>
<surname>Deem</surname>
<given-names>M.W.</given-names>
</name>
</person-group>
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Tao</surname>
<given-names>Y.J.</given-names>
</name>
</person-group>
<source>Influenza: Molecular Virology</source>
<year>2010</year>
<publisher-name>Caister Academic Press</publisher-name>
<comment>Chapter 10</comment>
</element-citation>
</ref>
<ref id="GZQ078C51">
<element-citation publication-type="journal">
<source>MMWR Morb Mortal Wkly Rep</source>
<year>2004</year>
<volume>53</volume>
<fpage>8</fpage>
<lpage>11</lpage>
<pub-id pub-id-type="pmid">14724559</pub-id>
</element-citation>
</ref>
<ref id="GZQ078C52">
<element-citation publication-type="other">
<article-title>Virology quarterly report July–September 2004</article-title>
<year>2004</year>
</element-citation>
</ref>
<ref id="GZQ078C53">
<element-citation publication-type="journal">
<source>Influenza Weekly Update</source>
<year>2005</year>
<fpage>22</fpage>
<lpage>38</lpage>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/H2N2V1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000C25 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000C25 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    H2N2V1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2978544
   |texte=   Low-dimensional clustering detects incipient dominant influenza strain clusters
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:21036781" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a H2N2V1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 14 19:59:40 2020. Site generation: Thu Mar 25 15:38:26 2021