Serveur d'exploration SRAS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

SARS-CoV Genome Polymorphism: A Bioinformatics Study

Identifieur interne : 001027 ( Pmc/Corpus ); précédent : 001026; suivant : 001028

SARS-CoV Genome Polymorphism: A Bioinformatics Study

Auteurs : Gordana M. Pavlovi Lažeti ; Nenad S. Miti ; Andrija M. Tomovi ; Mirjana D. Pavlovi ; Miloš V. Beljanski

Source :

RBID : PMC:5172477

Abstract

A dataset of 103 SARS-CoV isolates (101 human patients and 2 palm civets) was investigated on different aspects of genome polymorphism and isolate classification. The number and the distribution of single nucleotide variations (SNVs) and insertions and deletions, with respect to a “profile”, were determined and discussed ("profile" being a sequence containing the most represented letter per position). Distribution of substitution categories per codon positions, as well as synonymous and non-synonymous substitutions in coding regions of annotated isolates, was determined, along with amino acid (a.a.) property changes. Similar analysis was performed for the spike (S) protein in all the isolates (55 of them being predicted for the first time). The ratio Ka/Ks confirmed that the S gene was subjected to the Darwinian selection during virus transmission from animals to humans. Isolates from the dataset were classified according to genome polymorphism and genotypes. Genome polymorphism yields to two groups, one with a small number of SNVs and another with a large number of SNVs, with up to four subgroups with respect to insertions and deletions. We identified three basic nine-locus genotypes: TTTT/TTCGG, CGCC/TTCAT, and TGCC/TTCGT, with four subgenotypes. Both classifications proposed are in accordance with the new insights into possible epidemiological spread, both in space and time.


Url:
DOI: 10.1016/S1672-0229(05)03004-4
PubMed: 16144519
PubMed Central: 5172477

Links to Exploration step

PMC:5172477

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">SARS-CoV Genome Polymorphism: A Bioinformatics Study</title>
<author>
<name sortKey="Pavlovi Lazeti, Gordana M" sort="Pavlovi Lazeti, Gordana M" uniqKey="Pavlovi Lazeti G" first="Gordana M." last="Pavlovi Lažeti">Gordana M. Pavlovi Lažeti</name>
<affiliation>
<nlm:aff id="aff0005">Faculty of Mathematics, University of Belgrade, 11001 Belgrade, Serbia and Montenegro</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Miti, Nenad S" sort="Miti, Nenad S" uniqKey="Miti N" first="Nenad S." last="Miti">Nenad S. Miti</name>
<affiliation>
<nlm:aff id="aff0005">Faculty of Mathematics, University of Belgrade, 11001 Belgrade, Serbia and Montenegro</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tomovi, Andrija M" sort="Tomovi, Andrija M" uniqKey="Tomovi A" first="Andrija M." last="Tomovi">Andrija M. Tomovi</name>
<affiliation>
<nlm:aff id="aff0010">Friedrich Miescher Institute for Biomedical Research, CH-4058 Basel, Switzerland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pavlovi, Mirjana D" sort="Pavlovi, Mirjana D" uniqKey="Pavlovi M" first="Mirjana D." last="Pavlovi">Mirjana D. Pavlovi</name>
<affiliation>
<nlm:aff id="aff0015">Institute of General and Physical Chemistry, 11001 Belgrade, Serbia and Montenegro</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Beljanski, Milos V" sort="Beljanski, Milos V" uniqKey="Beljanski M" first="Miloš V." last="Beljanski">Miloš V. Beljanski</name>
<affiliation>
<nlm:aff id="aff0015">Institute of General and Physical Chemistry, 11001 Belgrade, Serbia and Montenegro</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">16144519</idno>
<idno type="pmc">5172477</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5172477</idno>
<idno type="RBID">PMC:5172477</idno>
<idno type="doi">10.1016/S1672-0229(05)03004-4</idno>
<date when="2005">2005</date>
<idno type="wicri:Area/Pmc/Corpus">001027</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001027</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">SARS-CoV Genome Polymorphism: A Bioinformatics Study</title>
<author>
<name sortKey="Pavlovi Lazeti, Gordana M" sort="Pavlovi Lazeti, Gordana M" uniqKey="Pavlovi Lazeti G" first="Gordana M." last="Pavlovi Lažeti">Gordana M. Pavlovi Lažeti</name>
<affiliation>
<nlm:aff id="aff0005">Faculty of Mathematics, University of Belgrade, 11001 Belgrade, Serbia and Montenegro</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Miti, Nenad S" sort="Miti, Nenad S" uniqKey="Miti N" first="Nenad S." last="Miti">Nenad S. Miti</name>
<affiliation>
<nlm:aff id="aff0005">Faculty of Mathematics, University of Belgrade, 11001 Belgrade, Serbia and Montenegro</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tomovi, Andrija M" sort="Tomovi, Andrija M" uniqKey="Tomovi A" first="Andrija M." last="Tomovi">Andrija M. Tomovi</name>
<affiliation>
<nlm:aff id="aff0010">Friedrich Miescher Institute for Biomedical Research, CH-4058 Basel, Switzerland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pavlovi, Mirjana D" sort="Pavlovi, Mirjana D" uniqKey="Pavlovi M" first="Mirjana D." last="Pavlovi">Mirjana D. Pavlovi</name>
<affiliation>
<nlm:aff id="aff0015">Institute of General and Physical Chemistry, 11001 Belgrade, Serbia and Montenegro</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Beljanski, Milos V" sort="Beljanski, Milos V" uniqKey="Beljanski M" first="Miloš V." last="Beljanski">Miloš V. Beljanski</name>
<affiliation>
<nlm:aff id="aff0015">Institute of General and Physical Chemistry, 11001 Belgrade, Serbia and Montenegro</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Genomics, Proteomics & Bioinformatics</title>
<idno type="ISSN">1672-0229</idno>
<idno type="eISSN">2210-3244</idno>
<imprint>
<date when="2005">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>A dataset of 103 SARS-CoV isolates (101 human patients and 2 palm civets) was investigated on different aspects of
<bold>genome</bold>
polymorphism and isolate classification. The number and the distribution of single nucleotide variations (SNVs) and insertions and deletions, with respect to a “profile”, were determined and discussed ("profile" being a sequence containing the most represented letter per position). Distribution of substitution categories per codon positions, as well as synonymous and non-synonymous substitutions in coding regions of annotated isolates, was determined, along with amino acid (a.a.) property changes. Similar analysis was performed for the spike (S) protein in all the isolates (55 of them being predicted for the first time). The ratio Ka/Ks confirmed that the S gene was subjected to the Darwinian selection during virus transmission from animals to humans. Isolates from the dataset were classified according to genome polymorphism and genotypes. Genome polymorphism yields to two groups, one with a small number of SNVs and another with a large number of SNVs, with up to four subgroups with respect to insertions and deletions. We identified three basic nine-locus genotypes: TTTT/TTCGG, CGCC/TTCAT, and TGCC/TTCGT, with four subgenotypes. Both classifications proposed are in accordance with the new insights into possible epidemiological spread, both in space and time.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Peiris, J S" uniqKey="Peiris J">J.S. Peiris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fouchier, R A" uniqKey="Fouchier R">R.A. Fouchier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rota, P A" uniqKey="Rota P">P.A. Rota</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marra, M A" uniqKey="Marra M">M.A. Marra</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Guan, Y" uniqKey="Guan Y">Y. Guan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stavrinides, J" uniqKey="Stavrinides J">J. Stavrinides</name>
</author>
<author>
<name sortKey="Guttman, D S" uniqKey="Guttman D">D.S. Guttman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Song, H D" uniqKey="Song H">H.D. Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="He, J F" uniqKey="He J">J.F. He</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stadler, K" uniqKey="Stadler K">K. Stadler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chiu, R W" uniqKey="Chiu R">R.W. Chiu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vega, V B" uniqKey="Vega V">V.B. Vega</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ziebuhr, J" uniqKey="Ziebuhr J">J. Ziebuhr</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Groneberg, D A" uniqKey="Groneberg D">D.A. Groneberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tan, Y J" uniqKey="Tan Y">Y.J. Tan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Babcock, G J" uniqKey="Babcock G">G.J. Babcock</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xiao, X" uniqKey="Xiao X">X. Xiao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wong, S K" uniqKey="Wong S">S.K. Wong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, J C" uniqKey="Zhao J">J.C. Zhao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, T" uniqKey="Zhou T">T. Zhou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ren, Y" uniqKey="Ren Y">Y. Ren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="He, Y" uniqKey="He Y">Y. He</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hua, R" uniqKey="Hua R">R. Hua</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lu, L" uniqKey="Lu L">L. Lu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Keng, C T" uniqKey="Keng C">C.T. Keng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sui, J" uniqKey="Sui J">J. Sui</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, H" uniqKey="Zhang H">H. Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Den Brink, E N" uniqKey="Van Den Brink E">E.N. van den Brink</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chou, C F" uniqKey="Chou C">C.F. Chou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Greenough, T C" uniqKey="Greenough T">T.C. Greenough</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, S" uniqKey="Wang S">S. Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pyrc, K" uniqKey="Pyrc K">K. Pyrc</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bi, S" uniqKey="Bi S">S. Bi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mooney, S D" uniqKey="Mooney S">S.D. Mooney</name>
</author>
<author>
<name sortKey="Klein, T E" uniqKey="Klein T">T.E. Klein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hu, L D" uniqKey="Hu L">L.D. Hu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yeh, S H" uniqKey="Yeh S">S.H. Yeh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pavlovic Lazetic, G M" uniqKey="Pavlovic Lazetic G">G.M. Pavlovic-Lazetic</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ruan, Y J" uniqKey="Ruan Y">Y.J. Ruan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chim, S S" uniqKey="Chim S">S.S. Chim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Z G" uniqKey="Wang Z">Z.G. Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lan, Y C" uniqKey="Lan Y">Y.C. Lan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thompson, J D" uniqKey="Thompson J">J.D. Thompson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cover, T M" uniqKey="Cover T">T.M. Cover</name>
</author>
<author>
<name sortKey="Thomas, J A" uniqKey="Thomas J">J.A. Thomas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Choi, J H" uniqKey="Choi J">J.H. Choi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rozas, J" uniqKey="Rozas J">J. Rozas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nei, M" uniqKey="Nei M">M. Nei</name>
</author>
<author>
<name sortKey="Gojovori, T" uniqKey="Gojovori T">T. Gojovori</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Genomics Proteomics Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">Genomics Proteomics Bioinformatics</journal-id>
<journal-title-group>
<journal-title>Genomics, Proteomics & Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="ppub">1672-0229</issn>
<issn pub-type="epub">2210-3244</issn>
<publisher>
<publisher-name>Beijing Institute of Genomics, the Chinese Academy of Sciences and the Genetics Society of China. Production and hosting by Elsevier B.V.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">16144519</article-id>
<article-id pub-id-type="pmc">5172477</article-id>
<article-id pub-id-type="publisher-id">S1672-0229(05)03004-4</article-id>
<article-id pub-id-type="doi">10.1016/S1672-0229(05)03004-4</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>SARS-CoV Genome Polymorphism: A Bioinformatics Study</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" id="au0005">
<name>
<surname>Pavlović-Lažetić</surname>
<given-names>Gordana M.</given-names>
</name>
<email>gordana@matf.bg.ac.yu</email>
<xref rid="aff0005" ref-type="aff">1</xref>
<xref rid="cor1" ref-type="corresp">*</xref>
</contrib>
<contrib contrib-type="author" id="au0010">
<name>
<surname>Mitić</surname>
<given-names>Nenad S.</given-names>
</name>
<xref rid="aff0005" ref-type="aff">1</xref>
</contrib>
<contrib contrib-type="author" id="au0015">
<name>
<surname>Tomović</surname>
<given-names>Andrija M.</given-names>
</name>
<xref rid="aff0010" ref-type="aff">2</xref>
</contrib>
<contrib contrib-type="author" id="au0020">
<name>
<surname>Pavlović</surname>
<given-names>Mirjana D.</given-names>
</name>
<xref rid="aff0015" ref-type="aff">3</xref>
</contrib>
<contrib contrib-type="author" id="au0025">
<name>
<surname>Beljanski</surname>
<given-names>Miloš V.</given-names>
</name>
<xref rid="aff0015" ref-type="aff">3</xref>
</contrib>
</contrib-group>
<aff id="aff0005">
<label>1</label>
Faculty of Mathematics, University of Belgrade, 11001 Belgrade, Serbia and Montenegro</aff>
<aff id="aff0010">
<label>2</label>
Friedrich Miescher Institute for Biomedical Research, CH-4058 Basel, Switzerland</aff>
<aff id="aff0015">
<label>3</label>
Institute of General and Physical Chemistry, 11001 Belgrade, Serbia and Montenegro</aff>
<author-notes>
<corresp id="cor1">
<label>*</label>
Corresponding author.
<email>gordana@matf.bg.ac.yu</email>
</corresp>
</author-notes>
<pub-date pub-type="pmc-release">
<day>28</day>
<month>11</month>
<year>2016</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on .</pmc-comment>
<pub-date pub-type="ppub">
<year>2005</year>
</pub-date>
<pub-date pub-type="epub">
<day>28</day>
<month>11</month>
<year>2016</year>
</pub-date>
<volume>3</volume>
<issue>1</issue>
<fpage>18</fpage>
<lpage>35</lpage>
<permissions>
<copyright-statement>Copyright © 2005 Beijing Institute of Genomics, the Chinese Academy of Sciences and the Genetics Society of China. Production and hosting by Elsevier B.V.</copyright-statement>
<copyright-year>2005</copyright-year>
<copyright-holder>Beijing Institute of Genomics, the Chinese Academy of Sciences and the Genetics Society of China</copyright-holder>
<license>
<license-p>Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.</license-p>
</license>
</permissions>
<abstract id="ab0005">
<p>A dataset of 103 SARS-CoV isolates (101 human patients and 2 palm civets) was investigated on different aspects of
<bold>genome</bold>
polymorphism and isolate classification. The number and the distribution of single nucleotide variations (SNVs) and insertions and deletions, with respect to a “profile”, were determined and discussed ("profile" being a sequence containing the most represented letter per position). Distribution of substitution categories per codon positions, as well as synonymous and non-synonymous substitutions in coding regions of annotated isolates, was determined, along with amino acid (a.a.) property changes. Similar analysis was performed for the spike (S) protein in all the isolates (55 of them being predicted for the first time). The ratio Ka/Ks confirmed that the S gene was subjected to the Darwinian selection during virus transmission from animals to humans. Isolates from the dataset were classified according to genome polymorphism and genotypes. Genome polymorphism yields to two groups, one with a small number of SNVs and another with a large number of SNVs, with up to four subgroups with respect to insertions and deletions. We identified three basic nine-locus genotypes: TTTT/TTCGG, CGCC/TTCAT, and TGCC/TTCGT, with four subgenotypes. Both classifications proposed are in accordance with the new insights into possible epidemiological spread, both in space and time.</p>
</abstract>
<kwd-group id="keys0005">
<title>Key words</title>
<kwd>SARS Coronavirus</kwd>
<kwd>single nucleotide polymorphism</kwd>
<kwd>insertions</kwd>
<kwd>deletions</kwd>
<kwd>spike protein</kwd>
<kwd>phylogenesis</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s0005">
<title>Introduction</title>
<p id="p0005">Severe acute respiratory syndrome (SARS), potentially fatal atypical pneumonia, first appeared in Guangdong province of China in November 2002 and soon afterward, within six months, spreaded all over the world (30 countries including China, Singapore, Vietnam, Canada, and USA), killing more than 700 people
<xref rid="bib1" ref-type="bibr">
<italic>(1)</italic>
</xref>
. In less than four weeks after the global outbreak, a novel member of Coronaviridae family, namely SARS Coronavirus (SARS-CoV), was identified in the blood of respiratory specimens and stools of SARS patients, and confirmed as the causative agent of disease according to the Koch postulates
<xref rid="bib2" ref-type="bibr">
<italic>(2)</italic>
</xref>
. Soon afterwards, first fully sequenced genomes of viral isolates were published
<xref rid="bib3" ref-type="bibr">3.</xref>
,
<xref rid="bib4" ref-type="bibr">4.</xref>
. In 2005 the number of fully sequenced viral isolates exceeds one hundred (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/entrez" id="ir0010">http://www.ncbi.nlm.nih.gov/entrez</ext-link>
).</p>
<p id="p0010">SARS-CoV probably originated due to genetic exchange (recombination) and/or mutations between viruses with different host specificities
<xref rid="bib5" ref-type="bibr">5.</xref>
,
<xref rid="bib6" ref-type="bibr">6.</xref>
. Since coronaviruses are known to relatively easily jump among species, it was hypothesized that the new virus might have originated from wild animals. The analysis of SARS-CoV proteins supports and suggests possible past recombination event between mammalianlike and avian-like parent viruses
<xref rid="bib6" ref-type="bibr">
<italic>(6)</italic>
</xref>
. Common sequence variants define three distinct genotypes of the SARS-CoV: one linked with animal [palm civet
<italic>(Paguma larvata)</italic>
] SARS-like viruses and early human phase, the other two linked with middle and late human phases, respectively
<xref rid="bib7" ref-type="bibr">7.</xref>
,
<xref rid="bib8" ref-type="bibr">8.</xref>
. SARS-CoV has a deleterious mutation of 29 nucleotides relative to the palm civet virus, indicating that if there was direct transmission, it went from civet to human, because deletions occur probably more easily than insertions (5). However, more recent reports indicate that SARS-CoV is distinct from the civet virus and it has not been answered so far whether the SARS-CoV originated from civet, or civet was infected from other species
<xref rid="bib9" ref-type="bibr">9.</xref>
,
<xref rid="bib10" ref-type="bibr">10.</xref>
. The genome is relatively stable, since its mutation rate has been determined to be between 1.83×10
<sup>−6</sup>
and 8.26×10
<sup>−6</sup>
nucleotide substitutions per site per day
<xref rid="bib11" ref-type="bibr">
<italic>(11)</italic>
</xref>
.</p>
<p id="p0015">The SARS-CoV genome is approximately 30 Kb positive single strand RNA that corresponds to polycistronic mRNA, consisting of 5’ and 3’ untranslated regions (UTRs), 13 to 15 open reading frames (ORFs), and about 10 intergenic regions (IGRs)
<xref rid="bib9" ref-type="bibr">9.</xref>
,
<xref rid="bib12" ref-type="bibr">12.</xref>
,
<xref rid="bib13" ref-type="bibr">13.</xref>
. Its genome includes genes encoding two replicate polyproteins (RNA-dependant-RNA-polymerase, i.e., pp 1a and pp 1ab), encompassing two-thirds of the genome, and a set of ORFs at 3’ end that code for four structural proteins: surface spike (S) glycoprotein (1,256 a.a.), envelope (E, 77 a.a.), matrix (M, 222 a.a.), and nucleocapsid (N, 423 a.a.) proteins. It also encodes for additional 8-9 predicted ORFs whose protein product functions are still under investigation (
<xref rid="bib14" ref-type="bibr">
<italic>14</italic>
</xref>
;
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/entrez" id="ir0015">http://www.ncbi.nlm.nih.gov/entrez</ext-link>
).</p>
<p id="p0020">The S protein is the main surface antigen of the SARS-CoV and is involved in virus attachment on susceptible cells using mechanism similar to those of class I fusion proteins. The receptor for the SARS-CoV S protein is identified as angiotensin-converting enzyme 2 (ACE-2), which is a metallopeptidase
<xref rid="bib15" ref-type="bibr">
<italic>(15)</italic>
</xref>
. The receptor-binding domain (RBD) has been determined to lay between a.a. postions 270-625 in recent studies
<xref rid="bib16" ref-type="bibr">16.</xref>
,
<xref rid="bib17" ref-type="bibr">17.</xref>
,
<xref rid="bib18" ref-type="bibr">18.</xref>
,
<xref rid="bib19" ref-type="bibr">19.</xref>
,
<xref rid="bib20" ref-type="bibr">20.</xref>
.</p>
<p id="p0025">Several epitope sites, defined by polyclonal or monoclonal antibodies, have been identified on the S protein, depending on experimental conditions, all lying within wide or narrow regions between a.a. 12-1,192
<xref rid="bib20" ref-type="bibr">20.</xref>
,
<xref rid="bib21" ref-type="bibr">21.</xref>
,
<xref rid="bib22" ref-type="bibr">22.</xref>
,
<xref rid="bib23" ref-type="bibr">23.</xref>
,
<xref rid="bib24" ref-type="bibr">24.</xref>
,
<xref rid="bib25" ref-type="bibr">25.</xref>
,
<xref rid="bib26" ref-type="bibr">26.</xref>
,
<xref rid="bib27" ref-type="bibr">27.</xref>
,
<xref rid="bib28" ref-type="bibr">28.</xref>
,
<xref rid="bib29" ref-type="bibr">29.</xref>
,
<xref rid="bib30" ref-type="bibr">30.</xref>
,
<xref rid="bib31" ref-type="bibr">31.</xref>
. Defining conserved immunodominant epitope regions of the S protein is of crucial importance for future anti-SARS vaccine development.</p>
<p id="p0030">The main goal of this work was twofold: to perform mutation analysis of SARS-CoV viral genomes, with special attention to the S protein; and to group them according to different aspects of sequence similarity, eventually pointing to phylogeny and epidemiological dynamics of SARS-CoV.</p>
</sec>
<sec id="s0010">
<title>Results and Discussion</title>
<sec id="s0015">
<title>Nucleotide content</title>
<p id="p0035">Nucleotide content of SARS-CoV isolates favors T and A nucleotides. The corresponding percentages of letters in non-UTR regions of all the 96 isolates were found to be as follows: T (30.7940%), A (28.4246%), G (20.8121%), C (19.9535%), N (G, A, T, C; 0.0143%), R (Pur; 0.0005%), K (G or T; 0.0001%), M (A or C; 0.0002%), S (G or C; 0.0001%), W (A or T; 0.0002%), and Y (Pyr; 0.0004%). The overall ratio of (A,T)/(G,C) in the dataset was almost 3:2 (1.45). The ratio of Pur vs. Pyr nucleotides was almost 1 (0.97).</p>
<p id="p0040">The distribution of nucleotides (nt) over sequences of length 250 nt is given in
<xref rid="ec0005" ref-type="supplementary-material">Figure S1</xref>
(
<xref rid="s0085" ref-type="sec">Supporting Online Material</xref>
). It exhibits three peak-regions of T nucleotide in the second quarter of the genome (ORF 1a), and rather stable behavior in the third quarter of the genome (ORF 1b), as also observed by Pyrc
<italic>et al</italic>
.
<xref rid="bib32" ref-type="bibr">
<italic>(32)</italic>
</xref>
for a group of coronaviruses (HCoV-NL63, HCoV-229E, SARS-CoV, and HCoV-OC43). Deviation of percentage of nucleotides over 250-nt blocks from the corresponding percentage in the whole dataset is given in
<xref rid="ec0005" ref-type="supplementary-material">Figure S2</xref>
. Except for 3’ UTR where T nucleotide is underrepresented with even about —13%, the highest excess from the average is about +10% in four peaks, which is exhibited again by T nucleotide, three of them being between positions 7,000 and 11,000 (ORF 1a), complementary with the nucleotide A represented with —10%, and the fourth one in the S protein. Otherwise the nucleotides’ offset oscillates rather regularly between —5% and +5% from the average.</p>
</sec>
<sec id="s0020">
<title>Genome polymorphism</title>
<p id="p0045">All the isolates had high degree of nucleotide identity (more than 99% pair wise). Still, they could be differentiated on the basis of their genome polymorphism, i.e., the number and sites of SNVs and insertions and deletions (INDELs). Analysis of genomic polymorphism of the isolates resulted in the following two facts (
<xref rid="t0020" ref-type="table">Tables 1</xref>
, S1, and S2). Firstly, two isolates, HSR 1 and AS, coincided with the “profile” on all the “non-empty” positions (see Materials and Methods) up to the poly-A sequence. Secondly, three isolates had large number of undefined nucleotides (N), either as contiguous segments (Sin3408 in ORFs 8a, 8b; Sin3408L in ORF 1b), or as scattered individual nucleotides or short clusters (SinP2) (
<xref rid="ec0005" ref-type="supplementary-material">Table S2</xref>
). Isolate Sin3408 was the only one that has a 34-nt longer 5’ UTR as compared with the “profile”. Thus these three isolates were not considered to be reliably compared with others.
<table-wrap position="float" id="t0020">
<label>Table 1</label>
<caption>
<p>SARS-CoV Genome Polymorphism 20 Geno</p>
</caption>
<alt-text id="at1025">Table 1</alt-text>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td>
<graphic xlink:href="gr1"></graphic>
<graphic xlink:href="gr1a"></graphic>
<graphic xlink:href="gr1b"></graphic>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Shaded entries correspond to annotated isolates. Identification (Label and ID) is given in accordance with the labels and identifiers from
<xref rid="ec0005" ref-type="supplementary-material">Table S1</xref>
. The four SNVs columns correspond to: the total number of SNVs, the number of SNVs in genes, in 5’ and 3’ UTRs, and in IGR. The seven columns named INDELs include the number of deletions at the 5’ end (5’ del), the length of long insertions (longIns) and long deletions (longDel), the number and length of short insertions (shortIns) and short deletions (shortDel) in the form
<italic>a</italic>
×
<italic>b</italic>
where
<italic>b</italic>
denotes the length and
<italic>a</italic>
denotes the number of occurrences, the number of deletions at the 3’ end (3’ del), and the length of a poly-A sequence at the 3’ end (3’ poly-A). Classification includes two columns. The Type column corresponds to the nine-locus nucleotides that are given in the form NNNN/NNNNN and represent nucleotides at (relative to CLUSTAL X output) positions 9,420, 17,604, 222,274, 27,891 / 3,861, 9,495, 11,514, 21,773, 26,534, respectively (absolute HSR 1 positions 9,404, 17,564, 22,222, 27,827 / 3,852, 9,479, 11,493, 21,721, 26,477). The last column, Group, reflects grouping of isolates.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<sec id="s0025">
<title>Nucleotide variations: single nucleotide polymorphism</title>
<p id="p0050">There were 446 SNV sites and 1,006 SNVs in total in the dataset, with the substitution rate 1.49%, which is about three times higher (both the number of SNVs and the substitution rate) than the corresponding findings
<xref rid="bib33" ref-type="bibr">
<italic>(33)</italic>
</xref>
for 17 isolates. An average number of SNVs per isolate was 10.48, giving an error rate of 3.6×10
<sup>−4</sup>
substitutions per nucleotide copied.</p>
<p id="p0055">There was only one site with multiple base substitutions (the original nucleotide base on that position being T): at the relative (CLUSTAL X) position 8,441 (ORF 1a), isolate ZMY 1 has the nucleotide C (absolute position 8,403), and isolates ShanghaiQXC1, ShanghaiQXC2 have the nucleotide A (absolute positions 8,312 and 7,733, respectively).</p>
<p id="p0060">The smallest distance between the two neighboring SNV sites in the whole dataset was 1; the largest one was 23,988 (in case of TW3 and TW1), while an average distance between the neighboring SNV sites in the whole dataset was 1,987 positions (
<xref rid="ec0005" ref-type="supplementary-material">Figure S3</xref>
). The distribution of isolates per SNV number (outside 5’, 3’ UTRs) showed regularity for up to 11 SNVs (almost Gaussian distribution) and irregular decrease for number of SNVs >11 (
<xref rid="ec0005" ref-type="supplementary-material">Figure S4</xref>
). Thus the number of SNVs less than or equal to 11 per isolate was considered as a “small” number of SNVs, and the number of SNVs greater than 11 was considered as a “large” number of SNVs. Most SNVs are clustered within two regions in ORF la and one region at the 3’ end of the viral genome that predominantly consists of small ORFs, leaving two small regions within ORF 1a, and a region that corresponds to ORF 1b as the most conservative ones (
<xref rid="f0010" ref-type="fig">Figure 1B</xref>
).
<fig id="f0010">
<label>Fig. 1</label>
<caption>
<p>Density distribution of SNVs (B), INDELs (C), mapped onto the gene map of the HSR 1 isolate, coinciding with the “profile” (A). Central region of the genome is rather conserved (lower density of SNVs is exhibited in the second third of the genome, ORF 1b), while the rest of the genome features high SNVs density. SNV peaks are present at (absolute HSR 1) positions 3,852, 9,404, 9,479, 11,493, 17,564 (ORF lab), 21,721, 22,222 (S protein), 26,477 (M protein), and 27,827 (ORF 8a).</p>
</caption>
<alt-text id="at0010">Fig. 1</alt-text>
<graphic xlink:href="gr2"></graphic>
</fig>
</p>
<p id="p0065">The entropy of each genome nucleotide position was calculated, showing that the most conserved sites are the ones with the smallest entropy and that the least conserved sites are the ones with the highest entropy (
<xref rid="bib34" ref-type="bibr">
<italic>34</italic>
</xref>
;
<xref rid="ec0005" ref-type="supplementary-material">Figure S5</xref>
). The nine loci used for classification can be found among the sites with the highest entropy.</p>
<p id="p0070">Percent of each category of substitution is given in
<xref rid="f0015" ref-type="fig">Figure 2</xref>
. There are 141 transversion sites and 306 transition sites,
<italic>i.e.</italic>
, 31.54%:68.46%, with 261 transversions (2.72 in average per isolate) and 745 transitions (7.76 in average per isolate).
<fig id="f0015">
<label>Fig. 2</label>
<caption>
<p>Distribution of nucleotide substitution categories. The most represented are the substitutions C↔T and the least represented are the substitutions C↔G.</p>
</caption>
<alt-text id="at0015">Fig. 2</alt-text>
<graphic xlink:href="gr3"></graphic>
</fig>
</p>
</sec>
<sec id="s0030">
<title>Length variations, insertions and deletions</title>
<p id="p0075">Analysis of the SARS-CoV genome showed that long INDELs were concentrated close to the 3’ end (except for the 579-nt deletion in the ShanghaiQXC2 isolate at the position 5,834, located in ORF 1a), while individual insertions were found along the whole genome, most in the second quarter, and individual deletions were quite rare. Density distribution of INDELs inside the SARS-CoV genome, and 5’ UTR, 3’ UTR length variations, are represented in
<xref rid="f0010" ref-type="fig">Figures 1C</xref>
, S6A and B, respectively.
<xref rid="ec0005" ref-type="supplementary-material">Figure S6C</xref>
represents the region of the genome between positions 27,700 and 28,300 (ORFs 7b, 8a, 8b, part of N-protein, in HSR 1 annotation), which is especially abundant with INDELs. While individual INDELs are present both in longer and shorter ORFs, longer INDELs are (except for previously mentioned deletions in the ShanghaiQXC2) all located in short ORFs.</p>
<p id="p0080">
<xref rid="f0040" ref-type="fig">Figure 3</xref>
represents comparison results of genome primary structure of the analyzed isolates, summarizing the following facts:
<fig id="f0040">
<label>Fig. 3</label>
<caption>
<p>Comparison of nucleotide structures of SARS-CoV complete genome isolates, represented in parts A and B of the figure according to similarity in their SNVs or INDELs positions.</p>
</caption>
<alt-text id="at1015">Fig. 2</alt-text>
<graphic xlink:href="gr4"></graphic>
<graphic xlink:href="gr4a"></graphic>
</fig>
</p>
<p id="p0085">Firstly, although the SARS-CoV genome has the established length of 29,727 nt
<xref rid="bib12" ref-type="bibr">
<italic>(12)</italic>
</xref>
, most isolates were shorter at the 5’ end (for the first 15 positions, majority of isolates were “empty”), and had various length “poly-A” strings at the 3’ end, or both (
<xref rid="t0020" ref-type="table">Table 1</xref>
). Several isolates had some short deletions inside the sequence,
<italic>e.g.</italic>
, Sin2677, Sin2748, TWC, PUMC02, PUMC03, TWJ, WHU, Sino1-11, Sino3-11, TW11, and SinP5.</p>
<p id="p0090">Secondly, there was a group of isolates that had insertions of length 29 nucleotides (GD01, SZ3, SZ16, GZ02, HSZ-Bb, HSZ-Bc, HSZ-Cb, and HSZ-Cc) at the relative position 27,995 (absolute position 27,869 in SZ3, SZ16, HSZ-Cc, and HSZ-Bc; protein BGI-PUP GZ29-nt-Ins, ORF 8a). Two of them were isolates from palm civet (SZ3 and SZ16) and the other six were isolates from human patients. This specific insertion is also treated as a deletion in all the other isolates, evolved from this early group
<xref rid="bib10" ref-type="bibr">
<italic>(10)</italic>
</xref>
.</p>
<p id="p0095">Thirdly, there were several groups of isolates that had long deletions: GZ-B, GZ-C (length 39 at the relative position 27,882, or absolute position 27,719 in GZ-C, ORF 7b), ZS-A, ZS-C (length 53 at the relative position 27,969, absolute 27,843 in ZS-A, ORF 8a), LC2, LC3, LC5 (length 386 at the relative position 27,829, absolute 27,704 in LC2, ORFs 7b, 8a, 8b), ShanghaiQXC2 (length 579 at the relative position 5959, absolute 5834, ORF la), Sin852, Sin849, and Sin846 (of length 57, 49, 137, respectively, at relative positions in region between 27,787 and 27,966, ORFs 7b, 8a) (
<xref rid="t0020" ref-type="table">Tables 1</xref>
and
<xref rid="ec0005" ref-type="supplementary-material">S2</xref>
,
<xref rid="f0040" ref-type="fig">Figure 3</xref>
).</p>
<p id="p0100">Fourthly, a large number of individual INDELs were identified in ZJ01, ZMY 1, SinP2, and SinP3 (
<xref rid="t0020" ref-type="table">Tables 1</xref>
and
<xref rid="ec0005" ref-type="supplementary-material">S2</xref>
,
<xref rid="f0040" ref-type="fig">Figure 3</xref>
).</p>
</sec>
<sec id="s0035">
<title>Mutation analysis</title>
<p id="p9875">While the distribution of nucleotides over different distances from SNV sites did not exhibit any regularities, the distribution of different nucleotides on distance 1 left to SNV sites (−1) did exhibit significant difference from their overall percentage in the dataset. The corresponding right (+1) distance distribution of nucleotides is almost uniform (
<xref rid="t0005" ref-type="table">Table 2</xref>
).
<table-wrap position="float" id="t0005">
<label>Table 2</label>
<caption>
<p>Distribution of Nucleotides on Distance 1 Left and Right to SNV Sites</p>
</caption>
<alt-text id="at0030">Table 2</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center">Nt</th>
<th align="right">(−1)num</th>
<th align="center">(−1)%</th>
<th align="center">(−1)diff%</th>
<th align="center">(+1)num</th>
<th align="center">(+1)%</th>
<th align="right">(+1)diff%</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">A</td>
<td align="center">358</td>
<td align="center">35.59%</td>
<td align="right">7.17%</td>
<td align="center">283</td>
<td align="center">28.13%</td>
<td align="right">−0.29%</td>
</tr>
<tr>
<td align="center">C</td>
<td align="center">179</td>
<td align="center">17.79%</td>
<td align="right">−2.16%</td>
<td align="center">203</td>
<td align="center">20.18%</td>
<td align="right">0.23%</td>
</tr>
<tr>
<td align="center">G</td>
<td align="center">230</td>
<td align="center">22.86%</td>
<td align="right">2.05%</td>
<td align="center">215</td>
<td align="center">21.37%</td>
<td align="right">0.56%</td>
</tr>
<tr>
<td align="center">Τ</td>
<td align="center">238</td>
<td align="center">23.66%</td>
<td align="right">−7.13%</td>
<td align="center">302</td>
<td align="center">30.02%</td>
<td align="right">−0.77%</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The distribution of nucleotides on distance 1 left to SNV sites (−1) and right to SNV sites (+1) is presented in total number of nucleotides, percentage, and difference from their overall percentage in the dataset.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p id="p0110">
<xref rid="ec0005" ref-type="supplementary-material">Figure S7</xref>
represents differences between the percentage of nucleotides at a given position and in the whole genome, for up to the distance 10 left and right from SNV sites.
<xref rid="ec0005" ref-type="supplementary-material">Figures S8A and B</xref>
represent distribution of substitutions preceded by different nucleotide bases, and followed by different nucleotide bases, respectively. It can be seen that on the C↔T substitutions, both C→T and T→C are favored by the preceding A and the following Τ (almost 40% of all the C↔T substitutions;
<xref rid="f0015" ref-type="fig">Figure 2</xref>
), while the substitution T→C is almost prohibited by the preceding T (only 3%). Clustered substitutions of length 2 are rare (
<italic>e.g.</italic>
, TC→AA, GA→CA).</p>
</sec>
<sec id="s0040">
<title>Codon usage</title>
<p id="p0120">Analysis of distribution of individual nucleotides over the three codon positions in annotated ORFs of all the annotated isolates showed that, except for short proteins such as E, M, and presumptive ORFs, all the codons exhibit the same tendency of T nucleotide dominating at the third codon position, and the G nucleotide dominating at the first codon position, while A and C appearing more often at the second codon position than elsewhere.
<xref rid="ec0005" ref-type="supplementary-material">Figure S9</xref>
represents distribution of nucleotides over the three codon positions in individual ORFs, and in total.</p>
<p id="p0125">Analysis of codon usage demonstrated the same facts as the distribution of nucleotides over the three codon positions. In total, the third nucleotide favored T (40.10%) over A (24.83%), C (18.90%), and G (16.16%). It was especially true for four-codon families a.a. (Thr, Pro, Ala, Gly, and Val). The same held for four-codon subsets of six-codon families (Arg, Leu, and Ser), differring at the third codon position only. The above was true for the ORF lab, S and N proteins, but not for another two structural proteins (E and M). The codon usage for SARS-CoV genome proteins is represented by
<xref rid="ec0005" ref-type="supplementary-material">Table S3</xref>
, and it is consistent with the results obtained for another human CoV genome, HCoV-NL63
<xref rid="bib32" ref-type="bibr">
<italic>(32)</italic>
</xref>
.</p>
</sec>
<sec id="s0045">
<title>Changes in amino acids</title>
<p id="p0130">Besides the number of SNVs, isolates differed in positions of SNVs, too.
<xref rid="ec0005" ref-type="supplementary-material">Table S4</xref>
represents positions where two or more SNVs occurred, for all the annotated isolates, along with nucleotides and ORFs (based on HSR 1 annotation), type of mutation (transition/transversion), a.a. position in ORF, a.a. change, a.a. property change, and nucleotide position in codon. Positions of multiple SNVs have been chosen in order to reduce the chance of erroneously determined SNV. There were 91 such SNV sites with 288 SNVs. It is interesting to notice that there were no multiple base substitutions (more than two different bases) in any of these positions. There were 227 transitions at 75 sites and 61 transversion at 16 sites, out of which 5 were in structural proteins: 2 in S, 2 in E, and 1 in M proteins. The most common mutation was C↔T mutation (45 sites or 50%), followed by A↔G (30 sites), A↔T and G↔T (7 sites each), and C↔A (2 sites). There was no mutation of the type C↔G.</p>
<p id="p0135">There were 28 SNV sites corresponding to the first codon position (20 transitions and 8 transversions), 2 of which representing silent mutation sites (C↔T, Leu). There were 33 SNV sites corresponding to the second codon position (31 transitions and 2 transversions), all of which cause a.a. change. There were 30 SNV sites corresponding to the third codon position (25 transitions and 5 transversions), 29 of which representing silent mutation sites (the only non-silent one is G↔T, Leu ↔ Phe).</p>
<p id="p0140">There were 31 synonymous multiple substitution sites and 60 non-synonymous ones, with substitution rate 0.31% (91/29,228) and non-synonymous substitution rate 0.21%, which is consistent with the corresponding findings for 17 SASR-CoV isolates
<xref rid="bib33" ref-type="bibr">
<italic>(33)</italic>
</xref>
. The number of multiple substitutions was for about 30% lower than the number of the overall substitutions, and so were the substitution rate and non-synonymous substitution rate.</p>
<p id="p0145">
<xref rid="ec0005" ref-type="supplementary-material">Table S5</xref>
summarizes the above findings. It represents the number of transition and transversion sites and the number of SNVs (in the form N
<sub>1</sub>
/N
<sub>2</sub>
) per position in codon and per mutation type, as well as the percentage of SNVs, and the number of silent mutation sites and silent SNVs.</p>
<p id="p0150">Concerning non-synonymous sites, 35 are within pp lab, 5 within ORF 3, 1 within E protein, 3 within M protein, 1 within ORF 6, 1 within ORF 8a, 1 within ORF 8b, 1 within N protein, and 11 within S protein (only for two-or-more substitution sites, and only in annotated isolates).</p>
</sec>
<sec id="s0050">
<title>Mutation analysis of the S protein</title>
<p id="p0155">The S protein is of particular interest for mutation analysis, being the key for host range determination. Multiple sequence alignment of the S protein in all the 96 SARS-CoV isolates showed that five of them, namely ZMY 1, SinP2, SinP3, SinP4, and Sin3408L, had large discrepancies with all the others due to individual insertions or deletions in them. Since such significant mismatches in the S protein sequence seemed to be the result of erroneous sequencing, we eliminated these five isolates and analyzed the S protein in the remaining 91 isolates.</p>
<p id="p0160">There were 34 isolates without SNVs in the S protein: TW2-TW11, Sino3-ll, AS, LC1, WHU, TWC3, PUMC01-PUMC03, CUHK-AG01, CUHK-AG3, Taiwan TC1-3, TWC, Sin2748, Sin2500, Sin2677, CUHK-Su10, HKU-39849, TWH, TWJ, TWK, TWY, and HSR 1. There were 62 SNV sites with 208 SNVs in total, and no multiple mutations.
<xref rid="ec0005" ref-type="supplementary-material">Table S6</xref>
represents SNV sites and all the SNVs in the S protein of the 91 isolates, along with nucleotides, type of mutation (transition/transversion), a.a. position in the protein, a.a. change, a.a. properties change, nucleotide position in codon, and number of SNVs at each SNV site. These findings overlap with the results reported in Song
<italic>et al</italic>
. (Ref.
<xref rid="bib7" ref-type="bibr">7</xref>
; concerning SNVs with multiple occurrences, in 103 S protein genes, some of which being nucleotide-identical, with 80% in common with our dataset), and are consistent on the intersecting data.
<xref rid="t0010" ref-type="table">Table 3</xref>
summarizes the results from
<xref rid="ec0005" ref-type="supplementary-material">Table S6</xref>
.
<table-wrap position="float" id="t0010">
<label>Table 3</label>
<caption>
<p>Mutation Analysis of the S Protein: Categories of Nucleotide Substitutions</p>
</caption>
<alt-text id="at0035">Table 3</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th align="center">1.pos</th>
<th align="center">2.pos</th>
<th align="center">3.pos</th>
<th colspan="2" align="center">Total No.</th>
<th align="center">1.pos%</th>
<th align="center">2.pos%</th>
<th align="center">3.pos%</th>
<th align="center">Total%</th>
<th align="center">Silent</th>
</tr>
</thead>
<tbody>
<tr>
<td>Transitions</td>
<td>A-G</td>
<td>A→G</td>
<td align="right">6/15</td>
<td align="right">2/8</td>
<td align="right">3/20</td>
<td align="right">11/43</td>
<td rowspan="2" valign="bottom">16/73</td>
<td align="right">7.21%</td>
<td align="right">3.85%</td>
<td align="right">9.62%</td>
<td align="right">20.68%</td>
<td align="right">3/20</td>
</tr>
<tr>
<td></td>
<td></td>
<td>G→A</td>
<td align="right">2/7</td>
<td align="right">2/22</td>
<td align="right">1/1</td>
<td align="right">5/30</td>
<td align="right">3.37%</td>
<td align="right">10.58%</td>
<td align="right">0.48%</td>
<td align="right">14.43%</td>
<td align="right">1/1</td>
</tr>
<tr>
<td></td>
<td>C-T</td>
<td>C→T</td>
<td align="right">3/7</td>
<td align="right">6/25</td>
<td align="right">6/19</td>
<td align="right">15/51</td>
<td rowspan="2" valign="middle">24/94
<hr></hr>
</td>
<td align="right">3.37%</td>
<td align="right">12.02%</td>
<td align="right">9.13%</td>
<td align="right">24.52%</td>
<td align="right">6/19</td>
</tr>
<tr>
<td></td>
<td>
<hr></hr>
</td>
<td>T→C
<hr></hr>
</td>
<td align="right">2/3
<hr></hr>
</td>
<td align="right">3/29
<hr></hr>
</td>
<td align="right">4/11
<hr></hr>
</td>
<td align="right">9/43
<hr></hr>
</td>
<td align="right">1.44%
<hr></hr>
</td>
<td align="right">13.94%
<hr></hr>
</td>
<td align="right">5.29%
<hr></hr>
</td>
<td align="right">20.67%
<hr></hr>
</td>
<td align="right">4/11
<hr></hr>
</td>
</tr>
<tr>
<td></td>
<td colspan="2" align="center">Total</td>
<td align="right">13/32</td>
<td align="right">13/84</td>
<td align="right">14/51</td>
<td></td>
<td align="right">40/167</td>
<td align="right">15.38%</td>
<td align="right">40.38%</td>
<td align="right">24.52%</td>
<td align="right">80.28%</td>
<td align="right">14/51</td>
</tr>
<tr>
<td colspan="13">
<hr></hr>
</td>
</tr>
<tr>
<td>Transversions</td>
<td>A→C</td>
<td>A→C</td>
<td align="right">2/2</td>
<td align="right">1/1</td>
<td align="right">2/2</td>
<td align="right">5/5</td>
<td rowspan="2" valign="middle">8/10</td>
<td align="right">0.96%</td>
<td align="right">0.48%</td>
<td align="right">0.96%</td>
<td align="right">2.40%</td>
<td align="right">2/2</td>
</tr>
<tr>
<td></td>
<td></td>
<td>C→A</td>
<td align="right">1/1</td>
<td align="right">1/2</td>
<td align="right">1/2</td>
<td align="right">3/5</td>
<td align="right">0.48%</td>
<td align="right">0.96%</td>
<td align="right">0.96%</td>
<td align="right">2.40%</td>
<td align="right">0</td>
</tr>
<tr>
<td></td>
<td>A-T</td>
<td>A→T</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td rowspan="2" valign="middle">6/7</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>T→A</td>
<td align="right">2/2</td>
<td align="right">1/1</td>
<td align="right">3/4</td>
<td align="right">6/7</td>
<td align="right">0.96%</td>
<td align="right">0.48%</td>
<td align="right">1.92%</td>
<td align="right">3.37%</td>
<td align="right">1/1</td>
</tr>
<tr>
<td></td>
<td>G-C</td>
<td>G→C</td>
<td align="right">1/1</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">1/1</td>
<td rowspan="2" valign="middle">3/5</td>
<td align="right">0.48%</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0.48%</td>
<td align="right">0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>C→G</td>
<td align="right">0</td>
<td align="right">2/4</td>
<td align="right">0</td>
<td align="right">2/4</td>
<td align="right">0</td>
<td align="right">1.92%</td>
<td align="right">0</td>
<td align="right">1.92%</td>
<td align="right">0</td>
</tr>
<tr>
<td></td>
<td>G-T</td>
<td>G→T</td>
<td align="right">0</td>
<td align="right">1/1</td>
<td align="right">1/1</td>
<td align="right">2/2</td>
<td rowspan="2" valign="middle">5/19
<hr></hr>
</td>
<td align="right">0</td>
<td align="right">0.48%</td>
<td align="right">0.48%</td>
<td align="right">0.96%</td>
<td align="right">1/1</td>
</tr>
<tr>
<td></td>
<td>
<hr></hr>
</td>
<td>T→G
<hr></hr>
</td>
<td align="right">2/14
<hr></hr>
</td>
<td align="right">0
<hr></hr>
</td>
<td align="right">1/3
<hr></hr>
</td>
<td align="right">3/17
<hr></hr>
</td>
<td align="right">6.73%
<hr></hr>
</td>
<td align="right">0
<hr></hr>
</td>
<td align="right">1.44%
<hr></hr>
</td>
<td align="right">8.17%
<hr></hr>
</td>
<td align="right">1/3
<hr></hr>
</td>
</tr>
<tr>
<td></td>
<td colspan="2" align="center">Total</td>
<td align="right">8/20</td>
<td align="right">6/9</td>
<td align="right">8/12</td>
<td></td>
<td>22/41</td>
<td align="right">9.62%</td>
<td align="right">4.33%</td>
<td align="right">5.77%</td>
<td align="right">19.72%</td>
<td align="right">5/7</td>
</tr>
<tr>
<td colspan="13">
<hr></hr>
</td>
</tr>
<tr>
<td>Total</td>
<td></td>
<td></td>
<td align="right">21/52</td>
<td align="right">19/93</td>
<td align="right">22/63</td>
<td></td>
<td>62/208</td>
<td align="right">25.00%</td>
<td align="right">44.71%</td>
<td align="right">30.29%</td>
<td align="right">100%</td>
<td align="right">19/58</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>S proteins in 91 isolates are considered. The number of transition and transversion sites and the number of SNVs (in the form N
<sub>1</sub>
/N
<sub>2</sub>
) per position in codon and per mutation type, as well as the percentage of SNVs, and the number of silent mutation sites and silent SNVs (in the form N
<sub>1</sub>
/N
<sub>2</sub>
), are presented.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p id="p0170">Out of 62 SNV sites, 19 were observed to be synonymous, with 58 synonymous SNVs in total, and 43 were observed to be non-synonymous substitution sites, with 150 non-synonymous SNVs in total (
<xref rid="ec0005" ref-type="supplementary-material">Table S6</xref>
). Substitution rate was 1.65% (62/3,768) and non-synonymous substitution rate was 1.14% (43/3,768), which is consistent with findings for the whole genome in the enlarged dataset, and is about three times higher than the corresponding findings for 17 isolates in Bi
<italic>et al</italic>
. (Ref.
<xref rid="bib33" ref-type="bibr">33</xref>
; 22 substitution sites, 13 non-synonymous, substitution rates 0.58, 0.35, respectively). As represented on
<xref rid="f0020" ref-type="fig">Figure 4</xref>
, most non-synonymous a.a. substitutions are located in external domain (ED); 14 of non-synonymous substitution are in RBD, 3 of them in the most narrow intersecting range. As it concerns epitopes, 40 of non-synonymous a.a. substitutions are located in overall epitope domains determined by various researchers. Finally, one non-synonymous a.a. substitution is located in transmembrane domain (TM) and two in internal domain (ID).
<fig id="f0020">
<label>Fig. 4</label>
<caption>
<p>Positions of synonymous and non-synonymous a.a. substitutions plotted against S protein primary structure. The y-axis represents number of SNVs per positions. SP, signal peptide; ED, external domain; TM, trans-membrane domain; and ID, internal domain (
<ext-link ext-link-type="uri" xlink:href="http://expasy.org/" id="ir0005">http://expasy.org/</ext-link>
). A. RBD determined by: 1. Babcock
<italic>et al</italic>
.
<xref rid="bib16" ref-type="bibr">
<italic>(16)</italic>
</xref>
, 2. Xiao
<italic>et al</italic>
.
<xref rid="bib17" ref-type="bibr">
<italic>(17)</italic>
</xref>
, 3. Wong
<italic>et al</italic>
.
<xref rid="bib18" ref-type="bibr">
<italic>(18)</italic>
</xref>
, 4. Zhao
<italic>et al</italic>
.
<xref rid="bib19" ref-type="bibr">
<italic>(19)</italic>
</xref>
, and 5. Zhou
<italic>et al</italic>
.
<xref rid="bib20" ref-type="bibr">
<italic>(20)</italic>
</xref>
<italic>;</italic>
B. epitope regions determined by: 1. Wang
<italic>et al</italic>
.
<xref rid="bib31" ref-type="bibr">
<italic>(31)</italic>
</xref>
, 2. Chou
<italic>et al</italic>
.
<xref rid="bib29" ref-type="bibr">
<italic>(29)</italic>
</xref>
, 3. Greenough
<italic>et al</italic>
.
<xref rid="bib30" ref-type="bibr">
<italic>(30)</italic>
</xref>
, 4. Sui
<italic>et al</italic>
.
<xref rid="bib26" ref-type="bibr">
<italic>(26)</italic>
</xref>
, 5. van den Brink
<italic>et al</italic>
.
<xref rid="bib28" ref-type="bibr">
<italic>(28)</italic>
</xref>
, 6. Lu
<italic>et al</italic>
.
<xref rid="bib24" ref-type="bibr">
<italic>(24)</italic>
</xref>
, 7. Hua
<italic>et al</italic>
.
<xref rid="bib23" ref-type="bibr">
<italic>(23)</italic>
</xref>
, 8. Ren
<italic>et al</italic>
.
<xref rid="bib21" ref-type="bibr">
<italic>(21)</italic>
</xref>
, 9. He
<italic>et al</italic>
.
<xref rid="bib22" ref-type="bibr">
<italic>(22)</italic>
</xref>
, 10. Zhou
<italic>et al</italic>
.
<xref rid="bib20" ref-type="bibr">
<italic>(20)</italic>
</xref>
, 11. Zhang
<italic>et al</italic>
.
<xref rid="bib27" ref-type="bibr">
<italic>(27)</italic>
</xref>
, and 12. Keng
<italic>et al</italic>
.
<xref rid="bib25" ref-type="bibr">
<italic>(25)</italic>
</xref>
.</p>
</caption>
<alt-text id="at0020">Fig. 4</alt-text>
<graphic xlink:href="gr5"></graphic>
</fig>
</p>
<p id="p0175">The coefficients Ks (number of synonymous substitutions per synonymous site) and Ka (number of non-synonymous substitutions per non-synonymous site) were calculated for all the 91 S proteins in the dataset to be Ks = 0.00135, Ka = 0.00103, and the ratio Ka/Ks had the value 0.7629 < 1, which may be interpreted as evidence for the S protein not being subjected to the Darwinian selection. These findings are consistent with the similar analysis performed for 20 SARS-CoV isolates by Hu
<italic>et al</italic>
.
<xref rid="bib35" ref-type="bibr">
<italic>(35)</italic>
</xref>
giving the ratio value of 0.65. Values of the corresponding coefficients and the ratio Ka/Ks for the 89 human patient isolates only, present even stronger evidence of the S protein being subject to negative selection: Ks = 0.00121, Ka = 0.00080, Ka/Ks = 0.661. The coefficients Ka, Ks, and the Ka/Ks ratio for all the human patients’ isolates and each of the palm civet isolates as the outgroup, are represented in
<xref rid="t0015" ref-type="table">Table 4</xref>
. These values indicated that the S gene was subjected to the Darwinian selection during virus evolution (transmission from animals to humans), which is consistent with the analysis performed by Yeh
<italic>et al</italic>
.
<xref rid="bib36" ref-type="bibr">
<italic>(36)</italic>
</xref>
, for 28 human isolates and the SZ3 palm civet as the outgroup, giving the corresponding ratio value of 1.657, and with the analysis performed by He
<italic>et al</italic>
.
<xref rid="bib8" ref-type="bibr">
<italic>(8)</italic>
</xref>
, indicating that the S gene showed the strongest positive selection pressures initially, with eventual stabilization.
<table-wrap position="float" id="t0015">
<label>Table 4</label>
<caption>
<p>Mutation Analysis of the S Protein: Coefficients Ka, Ks, and the Ratio Ka/Ks with An Outgroup</p>
</caption>
<alt-text id="at0040">Table 4</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Outgroup</th>
<th align="center">Ka</th>
<th align="center">Ks</th>
<th align="center">Ka/Ks</th>
</tr>
</thead>
<tbody>
<tr>
<td>SZ16 (AY304488)</td>
<td align="center">0.006257</td>
<td align="center">0.004930</td>
<td align="center">1.26935>M</td>
</tr>
<tr>
<td>SZ3 (AY304486)</td>
<td align="center">0.005889</td>
<td align="center">0.003803</td>
<td align="center">1.54856>1</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Coefficients Ka, Ks are calculated for all the human patients’ isolates and one of the palm civet isolates as an outgroup.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
</sec>
</sec>
<sec id="s0055">
<title>Phylogenetic analysis</title>
<p id="p0180">Phylogenetic tree, drawn using the PhyloDraw program for the CLUSTAL X output of aligning the 96 isolates, is represented in
<xref rid="f0025" ref-type="fig">Figure 5</xref>
. Its close relationship to the classification proposed in the paper suggests that classification of SARS-CoV isolates might be obtained by applying the computational analysis based on genome polymorphism.
<fig id="f0025">
<label>Fig. 5</label>
<caption>
<p>Three-level classification of 103 SARS-CoV genome isolates. Grouping of isolates is based on genome polymorphism, and classification is based on nine distinguished loci, mapped onto the bootstrapped phylogenetic tree obtained using CLUSTAL X and Neighbor Joining method, and drawn using PhyloDraw programs. Bootstrapping is performed with random number generated seed 111 and number of trials in bootstrap 1000. The two basic groups, A and B, are represented in yellow and blue, respectively. Types obtained according to the nine genome loci (9,404, 17,564, 22,222, 27,827 / 3,852, 9,479, 11,493, 21,721, 26,477) are labeled along the left edge of the figure and have the form NNNN / NNNNN, where N represents any nucleotide. Different subtypes are denoted by the corresponding substituted nucleotides in red. Dotted lines distinguish between the three epidemiological phases.</p>
</caption>
<alt-text id="at0025">Fig. 5</alt-text>
<graphic xlink:href="gr6"></graphic>
</fig>
</p>
<p id="p0185">All the isolates were classified according to their genome polymorphism—SNVs and INDELs, the procedure being proposed in our previous paper (57). Since SNV contents turned out to be a more distinguishable property than the presence of INDELs, as the
<italic>first classification criterion</italic>
we took the number and positions of SNVs. For the “profile” isolate, as the referent isolate, number of SNVs for different isolates varied from 0 (HSR 1, AS) to 78 (ZMY 1) (
<xref rid="t0020" ref-type="table">Table 1</xref>
). All the isolates were classified into two groups based on the number of SNVs with the “profile”—those with less than or equal to 11 SNVs, and those having more than 11 SNVs. Thus, the first classification criterion resulted in two groups (
<xref rid="t0020" ref-type="table">Table 1</xref>
):</p>
<p id="p0190">
<bold>Group A</bold>
—isolates with less than or equal to 11 SNVs (
<xref rid="t0020" ref-type="table">Tables 1</xref>
and S2);</p>
<p id="p0195">
<bold>Group B</bold>
—isolates with more than 11 SNVs relative to the “profile” isolate.</p>
<p id="p0200">Positions of SNVs moved several isolates between the two groups (SoD from B to A, since the most of its SNVs are in 3’ UTR; CUHK-W1 from A to B, since its number of SNVs with the “profile” of the A group is larger than the one of the B group; WHU from B to A; GZ50 from A to B; GD69 from B to A; and GZ-C from B to A).</p>
<p id="p0205">The
<italic>second classification criterion</italic>
was presence and position of long INDELs inside the basic A, B groups. We identified the following subgroups:</p>
<p id="p0210">A2, B2—subgroups of the A, B groups, respectively, with long insertions. A2 remained empty, while B2 contained 8 isolates with 29-nt insertions.</p>
<p id="p0215">A3, B3—subgroups of the A, B groups, respectively, with long deletions. A3 group consisted of 3-isolate subgroup (LC2, LC3, and LC5) with a deletion of length 386, Sin852 with a deletion of length 57, 2-isolates subgroup (GZ-B and GZ-C) with a deletion of length 39, Sin849 (deletion of length 49, embedded), and Sin846 (137, overlapping). B3 subgroup consisted of the isolates ZS-A (ZS-B) and ZS-C with the deletion of length 53.</p>
<p id="p0220">A4 and B4 were the subgroups with many individual INDELs (
<xref rid="t0020" ref-type="table">Table 1</xref>
). The rests of the A, B groups were denoted as A1 and B1, respectively.</p>
<p id="p0225">It can be noted that proposed grouping of 96 isolates, based on SNV and INDEL contents, conserved the earlier classification T-T-T-T/C-G-C-C
<xref rid="bib38" ref-type="bibr">
<italic>(38)</italic>
</xref>
, and partially coincided with the extension of this classification
<xref rid="bib39" ref-type="bibr">39.</xref>
,
<xref rid="bib40" ref-type="bibr">40.</xref>
. The four loci (9,404, 17,564, 22,222, and 27,827), as the basis for this classification, fitted nicely into our grouping (basically A1 group coincided with T-T-T-T type, while B1 group coincided with C-G-C-C type), expressing two inter-types: T-G-C-C [isolates GZ50, HZS2-D, HZS2-E, HZS2-C, HGZ8L2, HSZ2-A, NS-1(BJ04), HZS2-Fc, HZS2-Fb, and TJF] and C-G-T-T (isolates ShanghaiQXC1 and ShanghaiQXC2) (
<xref rid="f0025" ref-type="fig">Figure 5</xref>
). We found that another five loci, which are among the most represented SNVs’ loci (positions 3,852, 9,479, 11,493, 21,721, and 26,477;
<xref rid="f0010" ref-type="fig">Figure 1</xref>
), further refined our classification providing for sub-classification of the basic types.</p>
<p id="p0230">There were two basic nine-locus types: TTTT/TTCGG and CGCC/TTCAT, mostly coinciding with the A1, B1 groups, and the two inter-groups: an inter-(A-B)-group had the inter-type TGCC/TTCGT, and a subgroup of the group B1 (two Shanghai isolates) represented another inter-type CGTT/TTCGT (
<xref rid="f0025" ref-type="fig">Figure 5</xref>
). The proposed extension to the two main sequence variants (TTTT, CGCC) for an enlarged set of isolates, is in accordance with the new insights into possible epidemiological spread, both in space and time
<xref rid="bib36" ref-type="bibr">36.</xref>
,
<xref rid="bib38" ref-type="bibr">38.</xref>
,
<xref rid="bib41" ref-type="bibr">41.</xref>
. Namely, positions 3,852 and 11,493 differentiated between the two subgroups of the group A1 (all of the TTTT type): Taiwan epidemic (nucleotides C, T) from the other late epidemic isolates (nucleotides T, C)
<xref rid="bib41" ref-type="bibr">
<italic>(41)</italic>
</xref>
,
<italic>i.e.</italic>
, isolates closer to a Hong Kong virus unrelated to Hotel M (nucleotides C, T: isolates TW6-TW10, Taiwan TC1-TC3), and the others from the Hotel M lineage [nucleotides T, C: isolates from Canada (Tor2), Singapore (all Sin’s), Frankfurt (FRA Fr 1), Taiwan (TW1-TW5), Hong Kong (HKU 39849), Italy (HSR 1), China (ZJ01),
<italic>etc</italic>
.
<italic>]</italic>
<xref rid="bib36" ref-type="bibr">
<italic>(36)</italic>
</xref>
. Position 9,479 decomposed the B group [differentiated subgroup B1 (T) from the subgroups B2, B3 (C)], position 21,721 distinguished the group A from the group B. Precise characterization based on the nine loci, for all the isolates, is given in
<xref rid="t0020" ref-type="table">Table 1</xref>
and
<xref rid="f0025" ref-type="fig">Figure 5</xref>
.</p>
<p id="p0235">As compared to genotype clustering of SARS-CoV covering the epidemics from 2002 to 2004
<xref rid="bib7" ref-type="bibr">7.</xref>
,
<xref rid="bib8" ref-type="bibr">8.</xref>
, it can be noticed that the grouping we proposed was at most in accordance with it. Namely, the following correspondence between the two grouping schemes may be established:</p>
<p id="p0240">Firstly, genotype class CGCC/TCCAT (covering B2 and B3 subgroups), corresponded to human patients’ isolates from the early phase 2002-2003 (ZS, HSZ, GD01, GZ02—Guangzhou, China), and palm civet isolates (SZ3, SZ16—Hong Kong).</p>
<p id="p0245">Secondly, genotype class TGCC/TTCGT, TGCC/TTCAT (small part of A1 group), as well as CGCC/TTCAT (B1 group), corresponded to human patients middle phase 2002-2003 (positions 3,852, 9,479, 11,493, 26,477 determined this subclass); Beijing (BJ01-BJ04), and Hong Kong (CUHK W1).</p>
<p id="p0250">Thirdly, genotype classes TTTT/NNNNN, CGTT/NNNNN (almost the entire A group and Shanghai part of the B1 group) corresponded to human patients late phase 2002-2003 (
<xref rid="f0025" ref-type="fig">Figure 5</xref>
)—Singapore (Sin s), Taiwan (TW1-11), Shanghai (QX1, QX2), Italy (HSR 1), Canada (Tor2), Hanoi (Urbani), Hong Kong (HKU39849, CUHK-AG0x), China (ZJ01, WHU, PUMC0x), Frankfurt (FRA, Frankfurt 1),
<italic>etc</italic>
.</p>
<p id="p0255">The two basic groups, A and B, were rather contiguously mapped onto the phylogenetic tree, showing a high degree of accordance among the proposed grouping and the phylogenetic relationships. Exceptions represented the two isolates of the B4 group, with large number of SNVs and individual insertions (ZMY 1, ZJ01), as well as the two isolates of the Bl group (Shanghai QXC1 and QXC2), all of which being at large root-distances (
<xref rid="ec0005" ref-type="supplementary-material">Figure S10</xref>
).</p>
</sec>
</sec>
<sec id="s0060">
<title>Materials and Methods</title>
<sec id="s0065">
<title>Dataset</title>
<p id="p0260">Nucleotide sequences of 103 SARS-CoV complete genomes were taken from the PubMed NCBI Entrez database (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/entrez" id="ir0020">http://www.ncbi.nlm.nih.gov/entrez</ext-link>
) in GenBank and FASTA formats. Since there were 7 pairs of nucleotide-identical isolates, we considered the dataset to consist of 96 isolates (
<xref rid="t0020" ref-type="table">Tables 1</xref>
and
<xref rid="ec0005" ref-type="supplementary-material">S1</xref>
). All the sequences are between 29,013 (ShanghaiQXC2) and 29,767 (Sin3408) in length. Out of all, 19 sequences have ambiguous nucleotide codes,
<italic>i.e.</italic>
, N, M, R, Y, W, K, S (Table Si). Out of 96 isolates, 42 are fully or partially annotated. All the annotated isolates have the S protein annotated of length. 3,768 and the N protein of length 1,269 nucleotides, 4 isolates do not have the E protein (CUHK-Su10, PUMC01, PUMC02, PUMC03) of length 231 (except for Sino1-11 of length 228 nucleotides), 1 isolate does not have the M protein (CUHK-Su10) of length 666 nucleotides. Out of 42 annotated isolates, 13 have 5’ UTR and 12 have 3’ UTR determined. All the isolates are human sourced except for two isolated from palm civet
<italic>(Paguma larvata)</italic>
, SZ3 and SZ16-</p>
</sec>
<sec id="s0070">
<title>Genome polymorphism</title>
<p id="p0265">The CLUSTAL X program, version 1.83
<xref rid="bib42" ref-type="bibr">
<italic>(42)</italic>
</xref>
has been applied to all the isolates from the dataset. The overall CLUSTAL X output had length of 29,903 nt. Then 5’ UTR and 3’ UTR were identified based on positions in annotated isolates. Coding region encompassed the interval (301, 29,528), and had the length of 29,228 nt.</p>
<p id="p0270">We developed a program in Perl language for analysis of a CLUSTAL X output. The program first calculated an “average” isolate, the so called “profile”, by counting, for each position in the CLUSTAL X output, the number of occurrences of each different letter (including dash), and by choosing the most represented one; positions containing dashes in the “profile” are called “empty positions”, all the others being “non-empty” ones. The program then counted SNVs, INDELs, and calculated their absolute and relative positions, for every isolate with respect to the “profile”, and for different genome regions (ORFs, 5’ UTR, 3’ UTR, and IGRs).</p>
<p id="p0275">Substitution rate for the SARS-CoV genome and for the S protein for all the sequences in the dataset was calculated by dividing the total number of SNV sites by the length of the corresponding nucleotide sequence; non-synonymous substitution rate for the S protein was calculated by dividing the total number of non-synonymous SNV sites by the length of the S protein.</p>
</sec>
<sec id="s0075">
<title>Entropy of sites</title>
<p id="p0280">The entropy of each site has been calculated based on number of SNVs at that site, in order to estimate the sites’ conserveness. If we denote by
<italic>p(b)</italic>
—probability of occurrence of the nucleotide
<italic>b</italic>
(
<italic>b</italic>
being A, C, G, or T), and under assumption of sites being independent, we calculated the entropy of positions by the following formula
<xref rid="bib43" ref-type="bibr">
<italic>(43)</italic>
</xref>
: E = - sum p(6)* log[p(6)] (sum over
<italic>b</italic>
). In this definition,
<italic>p(b)</italic>
* log[p(6)] is taken to be zero if p(6) = 0.</p>
</sec>
<sec id="s0080">
<title>Phylogenetic investigations</title>
<p id="p0285">The first type of classification was performed the same way as in Pavlovic-Lazetic
<italic>et al</italic>
.
<xref rid="bib37" ref-type="bibr">
<italic>(37)</italic>
</xref>
. It is based on genome polymorphism (SNVs and INDELs). The distribution of isolates per SNV numbers (outside 5’, 3’ UTRs) was analyzed and the isolates were primarily classified into two groups—isolates with “small” number of SNVs and isolates with “large” number of SNVs. The isolates “close to border” were further tested (on the number of SNVs) against the profile isolates of each of the two groups, resulting in some isolates changing the group. A sub-classification was then performed on the presence of long or short INDELs inside each of the two groups.</p>
<p id="p0290">The second type of classification was performed based on contents of the most represented SNV sites. Except for earlier identified positions (9,404, 17,564, 22,222, 27,827) classifying isolates into TTTT/CGCC genotypes
<xref rid="bib38" ref-type="bibr">38.</xref>
,
<xref rid="bib39" ref-type="bibr">39.</xref>
, some other positions (genotypes) were identified as potential bases for sub-classification.</p>
<p id="p0295">In order to compare the two classification schemes developed, with the existing programming systems for phylogenetic analysis, phylogenetic bootstrapped tree was produced using CLUSTAL X program and the Neighbor Joining (NJ) method. The NJ method, as well as parsimony and the probabilistic models, produces unrooted trees. In order to produce the consensus tree, bootstrapping is performed with random number generated seed 111 and number of trials in bootstrap 1000. The tree is drawn using the PhyloDraw program
<xref rid="bib44" ref-type="bibr">
<italic>(44)</italic>
</xref>
and the proposed classification schemes were mapped onto it.</p>
</sec>
<sec id="s0485">
<title>Annotation and analysis of the S protein</title>
<p id="p0300">All the S protein sequences (those extracted from annotated isolates and the others we annotated using the publicly available program from PubMed tools for data mining;
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/gorf/gorf.html" id="ir0025">http://www.ncbi.nlm.nih.gov/gorf/gorf.html</ext-link>
) have been aligned using CLUSTAL X program. Then the S protein was analyzed using the same methods as for the complete isolates.</p>
</sec>
<sec id="s0090">
<title>Mutation analysis of the S protein</title>
<p id="p0305">Non-synonymous nucleotide substitution per non-synonymous site (Ka) and synonymous nucleotide substitution per synonymous site (Ks) were calculated using the DnaSP 4.0 program
<xref rid="bib45" ref-type="bibr">
<italic>(45)</italic>
</xref>
. It is based on a method defined by Nei and Gojovori
<xref rid="bib46" ref-type="bibr">
<italic>(46)</italic>
</xref>
that estimates the numbers of synonymous and non-synonymous nucleotide substitutions between two DNA sequences by counting the number of such substitutions in the corresponding pairs of codons. It also takes into account different evolutionary pathways between pairs of codons. The DnaSP program may run with or without an outgroup. The ratio Ka/Ks is considered as a selection parameter (Ka/Ks > 1 is usually interpreted as an indicator of positive selection). The coefficients Ka, Ks, as well as the ratio Ka/Ks were calculated first for the S protein in all the isolates in the dataset, without an outgroup. Since among the 91 isolates there were 89 human patients’ isolates and 2 palm civet isolates (SZ3, SZ16), we then calculated the Ka and Ks coefficients and the ratio Ka/Ks for the 89 human patients’ isolates only, without an outgroup, too. Eventually, we ran the program for all the human patients’ isolates and each of the palm civet isolates as the outgroup, in order to test the hypothesis that the S gene was subjected to positive selection during virus transmission from animals to humans.</p>
</sec>
</sec>
</body>
<back>
<ref-list id="bibliog0005">
<title>References</title>
<ref id="bib1">
<label>1.</label>
<element-citation publication-type="journal" id="sbref1">
<person-group person-group-type="author">
<name>
<surname>Peiris</surname>
<given-names>J.S.</given-names>
</name>
</person-group>
<article-title>Severe acute respiratory syndrome</article-title>
<source>Nat. Med.</source>
<volume>10</volume>
<year>2004</year>
<fpage>S88</fpage>
<lpage>S97</lpage>
<pub-id pub-id-type="pmid">15577937</pub-id>
</element-citation>
</ref>
<ref id="bib2">
<label>2.</label>
<element-citation publication-type="journal" id="sbref2">
<person-group person-group-type="author">
<name>
<surname>Fouchier</surname>
<given-names>R.A.</given-names>
</name>
</person-group>
<article-title>Aetiology: Koch’s postulates fulfilled for SARS virus</article-title>
<source>Nature</source>
<volume>423</volume>
<year>2003</year>
<fpage>240</fpage>
<pub-id pub-id-type="pmid">12748632</pub-id>
</element-citation>
</ref>
<ref id="bib3">
<label>3.</label>
<element-citation publication-type="journal" id="sbref3">
<person-group person-group-type="author">
<name>
<surname>Rota</surname>
<given-names>P.A.</given-names>
</name>
</person-group>
<article-title>Characterization of a novel coronavirus associated with severe acute respiratory syndrome</article-title>
<source>Science</source>
<volume>300</volume>
<year>2003</year>
<fpage>1394</fpage>
<lpage>1399</lpage>
<pub-id pub-id-type="pmid">12730500</pub-id>
</element-citation>
</ref>
<ref id="bib4">
<label>4.</label>
<element-citation publication-type="journal" id="sbref4">
<person-group person-group-type="author">
<name>
<surname>Marra</surname>
<given-names>M.A.</given-names>
</name>
</person-group>
<article-title>The genome sequence of the SARS-associated coronavirus</article-title>
<source>Science</source>
<volume>300</volume>
<year>2003</year>
<fpage>1399</fpage>
<lpage>1404</lpage>
<pub-id pub-id-type="pmid">12730501</pub-id>
</element-citation>
</ref>
<ref id="bib5">
<label>5.</label>
<element-citation publication-type="journal" id="sbref5">
<person-group person-group-type="author">
<name>
<surname>Guan</surname>
<given-names>Y.</given-names>
</name>
</person-group>
<article-title>Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China</article-title>
<source>Science</source>
<volume>302</volume>
<year>2003</year>
<fpage>276</fpage>
<lpage>278</lpage>
<pub-id pub-id-type="pmid">12958366</pub-id>
</element-citation>
</ref>
<ref id="bib6">
<label>6.</label>
<element-citation publication-type="journal" id="sbref6">
<person-group person-group-type="author">
<name>
<surname>Stavrinides</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Guttman</surname>
<given-names>D.S.</given-names>
</name>
</person-group>
<article-title>Mosaic evolution of the severe acute respiratory syndrome coronavirus</article-title>
<source>J. Virol.</source>
<volume>78</volume>
<year>2004</year>
<fpage>76</fpage>
<lpage>82</lpage>
<pub-id pub-id-type="pmid">14671089</pub-id>
</element-citation>
</ref>
<ref id="bib7">
<label>7.</label>
<element-citation publication-type="journal" id="sbref7">
<person-group person-group-type="author">
<name>
<surname>Song</surname>
<given-names>H.D.</given-names>
</name>
</person-group>
<article-title>Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human</article-title>
<source>Proc. Natl. Acad. Sci. USA</source>
<volume>102</volume>
<year>2005</year>
<fpage>2430</fpage>
<lpage>2435</lpage>
<pub-id pub-id-type="pmid">15695582</pub-id>
</element-citation>
</ref>
<ref id="bib8">
<label>8.</label>
<element-citation publication-type="journal" id="sbref8">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>J.F.</given-names>
</name>
</person-group>
<article-title>Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China</article-title>
<source>Science</source>
<volume>303</volume>
<year>2004</year>
<fpage>1666</fpage>
<lpage>1669</lpage>
<comment>(Chinese SARS Molecular Epidemiology Consortium)</comment>
<pub-id pub-id-type="pmid">14752165</pub-id>
</element-citation>
</ref>
<ref id="bib9">
<label>9.</label>
<element-citation publication-type="journal" id="sbref9">
<person-group person-group-type="author">
<name>
<surname>Stadler</surname>
<given-names>K.</given-names>
</name>
</person-group>
<article-title>SARS—beginning to understand a new virus</article-title>
<source>Nat. Rev. Microbiol.</source>
<volume>1</volume>
<year>2003</year>
<fpage>209</fpage>
<lpage>218</lpage>
<pub-id pub-id-type="pmid">15035025</pub-id>
</element-citation>
</ref>
<ref id="bib10">
<label>10.</label>
<element-citation publication-type="journal" id="sbref10">
<person-group person-group-type="author">
<name>
<surname>Chiu</surname>
<given-names>R.W.</given-names>
</name>
</person-group>
<article-title>Tracing SARS-coronavirus variant with large genomic deletion</article-title>
<source>Emerg. Infect. Dis.</source>
<volume>11</volume>
<year>2005</year>
<fpage>168</fpage>
<lpage>170</lpage>
<pub-id pub-id-type="pmid">15714661</pub-id>
</element-citation>
</ref>
<ref id="bib11">
<label>11.</label>
<element-citation publication-type="journal" id="sbref11">
<person-group person-group-type="author">
<name>
<surname>Vega</surname>
<given-names>V.B.</given-names>
</name>
</person-group>
<article-title>Mutational dynamics of the SARS coronavirus in cell culture and human populations isolated in 2003</article-title>
<source>BMC Infect. Dis.</source>
<volume>4</volume>
<year>2004</year>
<fpage>32</fpage>
<pub-id pub-id-type="pmid">15347429</pub-id>
</element-citation>
</ref>
<ref id="bib12">
<label>12.</label>
<element-citation publication-type="journal" id="sbref12">
<person-group person-group-type="author">
<name>
<surname>Ziebuhr</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Molecular biology of severe acute respiratory syndrome coronavirus</article-title>
<source>Curr. Opin. Microbiol.</source>
<volume>7</volume>
<year>2004</year>
<fpage>412</fpage>
<lpage>419</lpage>
<pub-id pub-id-type="pmid">15358261</pub-id>
</element-citation>
</ref>
<ref id="bib13">
<label>13.</label>
<element-citation publication-type="journal" id="sbref13">
<person-group person-group-type="author">
<name>
<surname>Groneberg</surname>
<given-names>D.A.</given-names>
</name>
</person-group>
<article-title>Molecular mechanisms of severe acute respiratory syndrome (SARS)</article-title>
<source>Respir. Res.</source>
<volume>6</volume>
<year>2005</year>
<fpage>8</fpage>
<pub-id pub-id-type="pmid">15661082</pub-id>
</element-citation>
</ref>
<ref id="bib14">
<label>14.</label>
<element-citation publication-type="journal" id="sbref14">
<person-group person-group-type="author">
<name>
<surname>Tan</surname>
<given-names>Y.J.</given-names>
</name>
</person-group>
<article-title>Characterization of viral proteins encoded by the SARS-coronavirus genome</article-title>
<source>Antiviral Res.</source>
<volume>65</volume>
<year>2005</year>
<fpage>69</fpage>
<lpage>78</lpage>
<pub-id pub-id-type="pmid">15708633</pub-id>
</element-citation>
</ref>
<ref id="bib15">
<label>15.</label>
<element-citation publication-type="journal" id="sbref15">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
</person-group>
<article-title>Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus</article-title>
<source>Nature</source>
<volume>426</volume>
<year>2003</year>
<fpage>450</fpage>
<lpage>454</lpage>
<pub-id pub-id-type="pmid">14647384</pub-id>
</element-citation>
</ref>
<ref id="bib16">
<label>16.</label>
<element-citation publication-type="journal" id="sbref16">
<person-group person-group-type="author">
<name>
<surname>Babcock</surname>
<given-names>G.J.</given-names>
</name>
</person-group>
<article-title>Amino acids 270 to 510 of the severe acute respiratory syndrome coronavirus spike protein are required for interaction with receptor</article-title>
<source>J. Virol.</source>
<volume>78</volume>
<year>2004</year>
<fpage>4552</fpage>
<lpage>4560</lpage>
<pub-id pub-id-type="pmid">15078936</pub-id>
</element-citation>
</ref>
<ref id="bib17">
<label>17.</label>
<element-citation publication-type="journal" id="sbref17">
<person-group person-group-type="author">
<name>
<surname>Xiao</surname>
<given-names>X.</given-names>
</name>
</person-group>
<article-title>The SARS-CoV S glycoprotein: expression and functional characterization</article-title>
<source>Biochem. Biophys. Res. Commun.</source>
<volume>312</volume>
<year>2003</year>
<fpage>1159</fpage>
<lpage>1164</lpage>
<pub-id pub-id-type="pmid">14651994</pub-id>
</element-citation>
</ref>
<ref id="bib18">
<label>18.</label>
<element-citation publication-type="journal" id="sbref18">
<person-group person-group-type="author">
<name>
<surname>Wong</surname>
<given-names>S.K.</given-names>
</name>
</person-group>
<article-title>A 193-amino acid fragment of the SARS coronavirus S protein efficiently binds angiotensin-converting enzyme 2</article-title>
<source>J. Biol. Chem.</source>
<volume>279</volume>
<year>2004</year>
<fpage>3197</fpage>
<lpage>3201</lpage>
<pub-id pub-id-type="pmid">14670965</pub-id>
</element-citation>
</ref>
<ref id="bib19">
<label>19.</label>
<element-citation publication-type="journal" id="sbref19">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>J.C.</given-names>
</name>
</person-group>
<article-title>Prokaryotic expression, refolding, and purification of fragment 450-650 of the spike protein of SARS-coronavirus</article-title>
<source>Protein Expr. Purif.</source>
<volume>39</volume>
<year>2005</year>
<fpage>169</fpage>
<lpage>174</lpage>
<pub-id pub-id-type="pmid">15642467</pub-id>
</element-citation>
</ref>
<ref id="bib20">
<label>20.</label>
<element-citation publication-type="journal" id="sbref20">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>T.</given-names>
</name>
</person-group>
<article-title>An exposed domain in the severe acute respiratory syndrome coronavirus spike protein induces neutralizing antibodies</article-title>
<source>J. Virol.</source>
<volume>78</volume>
<year>2004</year>
<fpage>7217</fpage>
<lpage>7226</lpage>
<pub-id pub-id-type="pmid">15194798</pub-id>
</element-citation>
</ref>
<ref id="bib21">
<label>21.</label>
<element-citation publication-type="journal" id="sbref21">
<person-group person-group-type="author">
<name>
<surname>Ren</surname>
<given-names>Y.</given-names>
</name>
</person-group>
<article-title>A strategy for searching antigenic regions in the SARS-CoV spike protein</article-title>
<source>Geno. Prot. Bioinfo.</source>
<volume>1</volume>
<year>2003</year>
<fpage>207</fpage>
<lpage>215</lpage>
</element-citation>
</ref>
<ref id="bib22">
<label>22.</label>
<element-citation publication-type="journal" id="sbref22">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>Y.</given-names>
</name>
</person-group>
<article-title>Identification of immunodominant sites on the spike protein of severe acute respiratory syndrome (SARS) coronavirus: implication for developing SARS diagnostics and vaccines</article-title>
<source>J. Immunol.</source>
<volume>173</volume>
<year>2004</year>
<fpage>4050</fpage>
<lpage>4057</lpage>
<pub-id pub-id-type="pmid">15356154</pub-id>
</element-citation>
</ref>
<ref id="bib23">
<label>23.</label>
<element-citation publication-type="journal" id="sbref23">
<person-group person-group-type="author">
<name>
<surname>Hua</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Identification of two antigenic epitopes on SARS-CoV spike protein</article-title>
<source>Biochem. Biophys. Res. Commun.</source>
<volume>319</volume>
<year>2004</year>
<fpage>929</fpage>
<lpage>935</lpage>
<pub-id pub-id-type="pmid">15184071</pub-id>
</element-citation>
</ref>
<ref id="bib24">
<label>24.</label>
<element-citation publication-type="journal" id="sbref24">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>Immunological characterization of the spike protein of the severe acute respiratory syndrome coronavirus</article-title>
<source>J. Clin. Microbiol.</source>
<volume>42</volume>
<year>2004</year>
<fpage>1570</fpage>
<lpage>1576</lpage>
<pub-id pub-id-type="pmid">15071006</pub-id>
</element-citation>
</ref>
<ref id="bib25">
<label>25.</label>
<element-citation publication-type="journal" id="sbref25">
<person-group person-group-type="author">
<name>
<surname>Keng</surname>
<given-names>C.T.</given-names>
</name>
</person-group>
<article-title>Amino acids 1055 to 1192 in the S2 region of severe acute respiratory syndrome coronavirus S Protein induce neutralizing antibodies: implications for the development of vaccines and antiviral agents</article-title>
<source>J. Virol.</source>
<volume>79</volume>
<year>2005</year>
<fpage>3289</fpage>
<lpage>3296</lpage>
<pub-id pub-id-type="pmid">15731223</pub-id>
</element-citation>
</ref>
<ref id="bib26">
<label>26.</label>
<element-citation publication-type="journal" id="sbref26">
<person-group person-group-type="author">
<name>
<surname>Sui</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Potent neutralization of severe acute respiratory syndrome (SARS) coronavirus by a human mAb to S1 protein that blocks receptor association</article-title>
<source>Proc. Natl. Acad. Sci. USA</source>
<volume>101</volume>
<year>2004</year>
<fpage>2536</fpage>
<lpage>2541</lpage>
<pub-id pub-id-type="pmid">14983044</pub-id>
</element-citation>
</ref>
<ref id="bib27">
<label>27.</label>
<element-citation publication-type="journal" id="sbref27">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>H.</given-names>
</name>
</person-group>
<article-title>Identification of an antigenic determinant on the S2 domain of the severe acute respiratory syndrome coronavirus spike glycoprotein capable of inducing neutralizing antibodies</article-title>
<source>J. Virol.</source>
<volume>78</volume>
<year>2004</year>
<fpage>6938</fpage>
<lpage>6945</lpage>
<pub-id pub-id-type="pmid">15194770</pub-id>
</element-citation>
</ref>
<ref id="bib28">
<label>28.</label>
<element-citation publication-type="journal" id="sbref28">
<person-group person-group-type="author">
<name>
<surname>van den Brink</surname>
<given-names>E.N.</given-names>
</name>
</person-group>
<article-title>Molecular and biological characterization of human monoclonal antibodies binding to the spike and nucleocapsid proteins of severe acute respiratory syndrome coronavirus</article-title>
<source>J. Virol.</source>
<volume>79</volume>
<year>2005</year>
<fpage>1635</fpage>
<lpage>1644</lpage>
<pub-id pub-id-type="pmid">15650189</pub-id>
</element-citation>
</ref>
<ref id="bib29">
<label>29.</label>
<element-citation publication-type="journal" id="sbref29">
<person-group person-group-type="author">
<name>
<surname>Chou</surname>
<given-names>C.F.</given-names>
</name>
</person-group>
<article-title>A novel cell-based binding assay system reconstituting interaction between SARS-CoV S protein and its cellular receptor</article-title>
<source>J. Virol. Methods</source>
<volume>123</volume>
<year>2005</year>
<fpage>41</fpage>
<lpage>48</lpage>
<pub-id pub-id-type="pmid">15582697</pub-id>
</element-citation>
</ref>
<ref id="bib30">
<label>30.</label>
<element-citation publication-type="journal" id="sbref30">
<person-group person-group-type="author">
<name>
<surname>Greenough</surname>
<given-names>T.C.</given-names>
</name>
</person-group>
<article-title>Development and characterization of a severe acute respiratory syndrome-associated coronavirus-neutralizing human monoclonal antibody that provides effective immunoprophylaxis in mice</article-title>
<source>J. Infect. Dis.</source>
<volume>191</volume>
<year>2005</year>
<fpage>507</fpage>
<lpage>514</lpage>
<pub-id pub-id-type="pmid">15655773</pub-id>
</element-citation>
</ref>
<ref id="bib31">
<label>31.</label>
<element-citation publication-type="journal" id="sbref31">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Identification of two neutralizing regions on the severe acute respiratory syndrome coronavirus spike glycoprotein produced from the mammalian expression system</article-title>
<source>J. Virol.</source>
<volume>79</volume>
<year>2005</year>
<fpage>1906</fpage>
<lpage>1910</lpage>
<pub-id pub-id-type="pmid">15650214</pub-id>
</element-citation>
</ref>
<ref id="bib32">
<label>32.</label>
<element-citation publication-type="journal" id="sbref32">
<person-group person-group-type="author">
<name>
<surname>Pyrc</surname>
<given-names>K.</given-names>
</name>
</person-group>
<article-title>Genome structure and transcriptional regulation of human coronavirus NL63</article-title>
<source>Virol. J.</source>
<volume>1</volume>
<year>2004</year>
<fpage>7</fpage>
<pub-id pub-id-type="pmid">15548333</pub-id>
</element-citation>
</ref>
<ref id="bib33">
<label>33.</label>
<element-citation publication-type="journal" id="sbref33">
<person-group person-group-type="author">
<name>
<surname>Bi</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Complete genome sequences of the SARS-CoV: the BJ group (Isolates BJ01-BJ04)</article-title>
<source>Geno. Prot. Bioinfo.</source>
<volume>1</volume>
<year>2003</year>
<fpage>180</fpage>
<lpage>192</lpage>
</element-citation>
</ref>
<ref id="bib34">
<label>34.</label>
<element-citation publication-type="journal" id="sbref34">
<person-group person-group-type="author">
<name>
<surname>Mooney</surname>
<given-names>S.D.</given-names>
</name>
<name>
<surname>Klein</surname>
<given-names>T.E.</given-names>
</name>
</person-group>
<article-title>The functional importance of disease-associated mutation</article-title>
<source>BMC Bioinformatics</source>
<volume>3</volume>
<year>2002</year>
<fpage>24</fpage>
<pub-id pub-id-type="pmid">12220483</pub-id>
</element-citation>
</ref>
<ref id="bib35">
<label>35.</label>
<element-citation publication-type="journal" id="sbref35">
<person-group person-group-type="author">
<name>
<surname>Hu</surname>
<given-names>L.D.</given-names>
</name>
</person-group>
<article-title>Mutation analysis of 20 SARS virus genome sequences: evidence for negative selection in replicase ORF1b and spike gene</article-title>
<source>Acta Pharmacol. Sin.</source>
<volume>24</volume>
<year>2003</year>
<fpage>741</fpage>
<lpage>745</lpage>
<pub-id pub-id-type="pmid">12904271</pub-id>
</element-citation>
</ref>
<ref id="bib36">
<label>36.</label>
<element-citation publication-type="journal" id="sbref36">
<person-group person-group-type="author">
<name>
<surname>Yeh</surname>
<given-names>S.H.</given-names>
</name>
</person-group>
<article-title>Characterization of severe respiratory syndrome coronavirus genomes in Taiwan: molecular epidemiology and genome evolution</article-title>
<source>Proc. Natl. Acad. Sci. USA</source>
<volume>101</volume>
<year>2004</year>
<fpage>2542</fpage>
<lpage>2547</lpage>
<pub-id pub-id-type="pmid">14983045</pub-id>
</element-citation>
</ref>
<ref id="bib37">
<label>37.</label>
<element-citation publication-type="journal" id="sbref37">
<person-group person-group-type="author">
<name>
<surname>Pavlovic-Lazetic</surname>
<given-names>G.M.</given-names>
</name>
</person-group>
<article-title>Bioinformatics analysis of SARS coronavirus genome polymorphism</article-title>
<source>BMC Bioinformatics</source>
<volume>5</volume>
<year>2004</year>
<fpage>65</fpage>
<pub-id pub-id-type="pmid">15161495</pub-id>
</element-citation>
</ref>
<ref id="bib38">
<label>38.</label>
<element-citation publication-type="journal" id="sbref38">
<person-group person-group-type="author">
<name>
<surname>Ruan</surname>
<given-names>Y.J.</given-names>
</name>
</person-group>
<article-title>Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection</article-title>
<source>Lancet</source>
<volume>361</volume>
<year>2003</year>
<fpage>1779</fpage>
<lpage>1785</lpage>
<pub-id pub-id-type="pmid">12781537</pub-id>
</element-citation>
</ref>
<ref id="bib39">
<label>39.</label>
<element-citation publication-type="journal" id="sbref39">
<person-group person-group-type="author">
<name>
<surname>Chim</surname>
<given-names>S.S.</given-names>
</name>
</person-group>
<article-title>Genomic sequencing of a SARS coronavirus isolate that predated the Metropole Hotel case cluster in Hong Kong</article-title>
<source>Clin. Chem.</source>
<volume>50</volume>
<year>2004</year>
<fpage>231</fpage>
<lpage>233</lpage>
<pub-id pub-id-type="pmid">14709660</pub-id>
</element-citation>
</ref>
<ref id="bib40">
<label>40.</label>
<element-citation publication-type="journal" id="sbref40">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Z.G.</given-names>
</name>
</person-group>
<article-title>Molecular biological analysis of genotyping and phylogeny of severe acute respiratory syndrome associated coronavirus</article-title>
<source>Chin. Med. J (Engl.)</source>
<volume>117</volume>
<year>2004</year>
<fpage>42</fpage>
<lpage>48</lpage>
<pub-id pub-id-type="pmid">14733771</pub-id>
</element-citation>
</ref>
<ref id="bib41">
<label>41.</label>
<element-citation publication-type="journal" id="sbref41">
<person-group person-group-type="author">
<name>
<surname>Lan</surname>
<given-names>Y.C.</given-names>
</name>
</person-group>
<article-title>Phylogenetic analysis and sequence comparison of structural and non-structural SARS coronavirus proteins in Taiwan</article-title>
<source>Infect. Genet. Evol.</source>
<volume>5</volume>
<year>2005</year>
<fpage>261</fpage>
<lpage>269</lpage>
<pub-id pub-id-type="pmid">15737918</pub-id>
</element-citation>
</ref>
<ref id="bib42">
<label>42.</label>
<element-citation publication-type="journal" id="sbref42">
<person-group person-group-type="author">
<name>
<surname>Thompson</surname>
<given-names>J.D.</given-names>
</name>
</person-group>
<article-title>The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools</article-title>
<source>Nucleic Acids Res.</source>
<volume>24</volume>
<year>1997</year>
<fpage>4876</fpage>
<lpage>4882</lpage>
</element-citation>
</ref>
<ref id="bib43">
<label>43.</label>
<element-citation publication-type="book" id="sbref43">
<person-group person-group-type="author">
<name>
<surname>Cover</surname>
<given-names>T.M.</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>J.A.</given-names>
</name>
</person-group>
<chapter-title>Elements of Information Theory</chapter-title>
<year>1991</year>
<publisher-name>John Wiley & Sons, Inc.</publisher-name>
<publisher-loc>New York, USA</publisher-loc>
</element-citation>
</ref>
<ref id="bib44">
<label>44.</label>
<element-citation publication-type="journal" id="sbref44">
<person-group person-group-type="author">
<name>
<surname>Choi</surname>
<given-names>J.H.</given-names>
</name>
</person-group>
<article-title>PhyloDraw: a phylogenetic tree drawing system</article-title>
<source>Bioinformatics</source>
<volume>16</volume>
<year>2000</year>
<fpage>1056</fpage>
<lpage>1058</lpage>
<pub-id pub-id-type="pmid">11159323</pub-id>
</element-citation>
</ref>
<ref id="bib45">
<label>45.</label>
<element-citation publication-type="journal" id="sbref45">
<person-group person-group-type="author">
<name>
<surname>Rozas</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>DnaSP, DNA polymorphism analyses by the coalescent and other methods</article-title>
<source>Bioinformatics</source>
<volume>19</volume>
<year>2003</year>
<fpage>2496</fpage>
<lpage>2497</lpage>
<pub-id pub-id-type="pmid">14668244</pub-id>
</element-citation>
</ref>
<ref id="bib46">
<label>46.</label>
<element-citation publication-type="journal" id="sbref46">
<person-group person-group-type="author">
<name>
<surname>Nei</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gojovori</surname>
<given-names>T.</given-names>
</name>
</person-group>
<article-title>Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotde substitutions</article-title>
<source>Mol. Biol. Evol.</source>
<volume>3</volume>
<year>1986</year>
<fpage>418</fpage>
<lpage>426</lpage>
<pub-id pub-id-type="pmid">3444411</pub-id>
</element-citation>
</ref>
</ref-list>
<sec id="s0085" sec-type="supplementary-material">
<title>Supporting Online Material</title>
<p id="p0105">
<ext-link ext-link-type="uri" xlink:href="http://www.gpbjournal.org/journal/pdf/GPB3(2)-05.pdf" id="ir0040">http://www.gpbjournal.org/journal/pdf/GPB3(2)-05.pdf</ext-link>
</p>
<p id="p9150">
<supplementary-material content-type="local-data" id="ec0005">
<caption>
<p>Figures S1-S10, Tables S1-S6</p>
</caption>
<media xlink:href="mmc1.pdf"></media>
</supplementary-material>
</p>
</sec>
<ack id="ack0005">
<title>Acknowledgements</title>
<p>This work was supported by the Ministry of Science and Technology, Republic of Serbia, Project No. 1858.</p>
</ack>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/SrasV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001027 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 001027 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    SrasV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:5172477
   |texte=   SARS-CoV Genome Polymorphism: A Bioinformatics Study
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:16144519" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a SrasV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 28 14:49:16 2020. Site generation: Sat Mar 27 22:06:49 2021