Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition

Identifieur interne : 000427 ( Pmc/Corpus ); précédent : 000426; suivant : 000428

Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition

Auteurs : Qin Tang ; Yulong Song ; Mijuan Shi ; Yingyin Cheng ; Wanting Zhang ; Xiao-Qin Xia

Source :

RBID : PMC:4660426

Abstract

Many coronaviruses are capable of interspecies transmission. Some of them have caused worldwide panic as emerging human pathogens in recent years, e.g., severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). In order to assess their threat to humans, we explored to infer the potential hosts of coronaviruses using a dual-model approach based on nineteen parameters computed from spike genes of coronaviruses. Both the support vector machine (SVM) model and the Mahalanobis distance (MD) discriminant model achieved high accuracies in leave-one-out cross-validation of training data consisting of 730 representative coronaviruses (99.86% and 98.08% respectively). Predictions on 47 additional coronaviruses precisely conformed to conclusions or speculations by other researchers. Our approach is implemented as a web server that can be accessed at http://bioinfo.ihb.ac.cn/seq2hosts.

Electronic supplementary material

The online version of this article (doi:10.1038/srep17155) contains supplementary material, which is available to authorized users.


Url:
DOI: 10.1038/srep17155
PubMed: 26607834
PubMed Central: 4660426

Links to Exploration step

PMC:4660426

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition</title>
<author>
<name sortKey="Tang, Qin" sort="Tang, Qin" uniqKey="Tang Q" first="Qin" last="Tang">Qin Tang</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.410726.6</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1797 8419</institution-id>
<institution>University of Chinese Academy of Sciences,</institution>
</institution-wrap>
Beijing, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Song, Yulong" sort="Song, Yulong" uniqKey="Song Y" first="Yulong" last="Song">Yulong Song</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.410726.6</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1797 8419</institution-id>
<institution>University of Chinese Academy of Sciences,</institution>
</institution-wrap>
Beijing, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Shi, Mijuan" sort="Shi, Mijuan" uniqKey="Shi M" first="Mijuan" last="Shi">Mijuan Shi</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cheng, Yingyin" sort="Cheng, Yingyin" uniqKey="Cheng Y" first="Yingyin" last="Cheng">Yingyin Cheng</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Wanting" sort="Zhang, Wanting" uniqKey="Zhang W" first="Wanting" last="Zhang">Wanting Zhang</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Xia, Xiao Qin" sort="Xia, Xiao Qin" uniqKey="Xia X" first="Xiao-Qin" last="Xia">Xiao-Qin Xia</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26607834</idno>
<idno type="pmc">4660426</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4660426</idno>
<idno type="RBID">PMC:4660426</idno>
<idno type="doi">10.1038/srep17155</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000427</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000427</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition</title>
<author>
<name sortKey="Tang, Qin" sort="Tang, Qin" uniqKey="Tang Q" first="Qin" last="Tang">Qin Tang</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.410726.6</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1797 8419</institution-id>
<institution>University of Chinese Academy of Sciences,</institution>
</institution-wrap>
Beijing, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Song, Yulong" sort="Song, Yulong" uniqKey="Song Y" first="Yulong" last="Song">Yulong Song</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.410726.6</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1797 8419</institution-id>
<institution>University of Chinese Academy of Sciences,</institution>
</institution-wrap>
Beijing, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Shi, Mijuan" sort="Shi, Mijuan" uniqKey="Shi M" first="Mijuan" last="Shi">Mijuan Shi</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cheng, Yingyin" sort="Cheng, Yingyin" uniqKey="Cheng Y" first="Yingyin" last="Cheng">Yingyin Cheng</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Wanting" sort="Zhang, Wanting" uniqKey="Zhang W" first="Wanting" last="Zhang">Wanting Zhang</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Xia, Xiao Qin" sort="Xia, Xiao Qin" uniqKey="Xia X" first="Xiao-Qin" last="Xia">Xiao-Qin Xia</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Scientific Reports</title>
<idno type="eISSN">2045-2322</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Many coronaviruses are capable of interspecies transmission. Some of them have caused worldwide panic as emerging human pathogens in recent years, e.g., severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). In order to assess their threat to humans, we explored to infer the potential hosts of coronaviruses using a dual-model approach based on nineteen parameters computed from spike genes of coronaviruses. Both the support vector machine (SVM) model and the Mahalanobis distance (MD) discriminant model achieved high accuracies in leave-one-out cross-validation of training data consisting of 730 representative coronaviruses (99.86% and 98.08% respectively). Predictions on 47 additional coronaviruses precisely conformed to conclusions or speculations by other researchers. Our approach is implemented as a web server that can be accessed at
<ext-link ext-link-type="uri" xlink:href="http://bioinfo.ihb.ac.cn/seq2hosts">http://bioinfo.ihb.ac.cn/seq2hosts</ext-link>
.</p>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1038/srep17155) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Jfw" uniqKey="Chan J">JFW Chan</name>
</author>
<author>
<name sortKey="To, Kkw" uniqKey="To K">KKW To</name>
</author>
<author>
<name sortKey="Tse, H" uniqKey="Tse H">H Tse</name>
</author>
<author>
<name sortKey="Jin, Dy" uniqKey="Jin D">DY Jin</name>
</author>
<author>
<name sortKey="Yuen, Ky" uniqKey="Yuen K">KY Yuen</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lau, Skp" uniqKey="Lau S">SKP Lau</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Graham, Rl" uniqKey="Graham R">RL Graham</name>
</author>
<author>
<name sortKey="Donaldson, Ef" uniqKey="Donaldson E">EF Donaldson</name>
</author>
<author>
<name sortKey="Baric, Rs" uniqKey="Baric R">RS Baric</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Woo, Pcy" uniqKey="Woo P">PCY Woo</name>
</author>
<author>
<name sortKey="Huang, Y" uniqKey="Huang Y">Y Huang</name>
</author>
<author>
<name sortKey="Lau, Skp" uniqKey="Lau S">SKP Lau</name>
</author>
<author>
<name sortKey="Yuen, Ky" uniqKey="Yuen K">KY Yuen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, F" uniqKey="Li F">F Li</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Farzan, M" uniqKey="Farzan M">M Farzan</name>
</author>
<author>
<name sortKey="Harrison, Sc" uniqKey="Harrison S">SC Harrison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, F" uniqKey="Li F">F Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Perlman, S" uniqKey="Perlman S">S Perlman</name>
</author>
<author>
<name sortKey="Netland, J" uniqKey="Netland J">J Netland</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lobo, Fp" uniqKey="Lobo F">FP Lobo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dunham, Ej" uniqKey="Dunham E">EJ Dunham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Greenbaum, Bd" uniqKey="Greenbaum B">BD Greenbaum</name>
</author>
<author>
<name sortKey="Levine, Aj" uniqKey="Levine A">AJ Levine</name>
</author>
<author>
<name sortKey="Bhanot, G" uniqKey="Bhanot G">G Bhanot</name>
</author>
<author>
<name sortKey="Rabadan, R" uniqKey="Rabadan R">R Rabadan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chantawannakul, P" uniqKey="Chantawannakul P">P Chantawannakul</name>
</author>
<author>
<name sortKey="Cutler, Rw" uniqKey="Cutler R">RW Cutler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shackelton, La" uniqKey="Shackelton L">LA Shackelton</name>
</author>
<author>
<name sortKey="Parrish, Cr" uniqKey="Parrish C">CR Parrish</name>
</author>
<author>
<name sortKey="Holmes, Ec" uniqKey="Holmes E">EC Holmes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gu, Wj" uniqKey="Gu W">WJ Gu</name>
</author>
<author>
<name sortKey="Zhou, T" uniqKey="Zhou T">T Zhou</name>
</author>
<author>
<name sortKey="Ma, Jm" uniqKey="Ma J">JM Ma</name>
</author>
<author>
<name sortKey="Sun, X" uniqKey="Sun X">X Sun</name>
</author>
<author>
<name sortKey="Lu, Zh" uniqKey="Lu Z">ZH Lu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berkhout, B" uniqKey="Berkhout B">B Berkhout</name>
</author>
<author>
<name sortKey="Grigoriev, A" uniqKey="Grigoriev A">A Grigoriev</name>
</author>
<author>
<name sortKey="Bakker, M" uniqKey="Bakker M">M Bakker</name>
</author>
<author>
<name sortKey="Lukashov, Vv" uniqKey="Lukashov V">VV Lukashov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jenkins, Gm" uniqKey="Jenkins G">GM Jenkins</name>
</author>
<author>
<name sortKey="Pagel, M" uniqKey="Pagel M">M Pagel</name>
</author>
<author>
<name sortKey="Gould, Ea" uniqKey="Gould E">EA Gould</name>
</author>
<author>
<name sortKey="Zanotto, Pmd" uniqKey="Zanotto P">PMD Zanotto</name>
</author>
<author>
<name sortKey="Holmes, Ec" uniqKey="Holmes E">EC Holmes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rima, Bk" uniqKey="Rima B">BK Rima</name>
</author>
<author>
<name sortKey="Mcferran, Nv" uniqKey="Mcferran N">NV McFerran</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vapnik, Vn" uniqKey="Vapnik V">VN Vapnik</name>
</author>
<author>
<name sortKey="Chervone, Ay" uniqKey="Chervone A">Ay Chervone</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cortes, C" uniqKey="Cortes C">C Cortes</name>
</author>
<author>
<name sortKey="Vapnik, V" uniqKey="Vapnik V">V Vapnik</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Furey, Ts" uniqKey="Furey T">TS Furey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="De Maesschalck, R" uniqKey="De Maesschalck R">R De Maesschalck</name>
</author>
<author>
<name sortKey="Jouan Rimbaud, D" uniqKey="Jouan Rimbaud D">D Jouan-Rimbaud</name>
</author>
<author>
<name sortKey="Massart, Dl" uniqKey="Massart D">DL Massart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kapoor, A" uniqKey="Kapoor A">A Kapoor</name>
</author>
<author>
<name sortKey="Simmonds, P" uniqKey="Simmonds P">P Simmonds</name>
</author>
<author>
<name sortKey="Lipkin, Wi" uniqKey="Lipkin W">WI Lipkin</name>
</author>
<author>
<name sortKey="Zaidi, S" uniqKey="Zaidi S">S Zaidi</name>
</author>
<author>
<name sortKey="Delwart, E" uniqKey="Delwart E">E Delwart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Karlin, S" uniqKey="Karlin S">S Karlin</name>
</author>
<author>
<name sortKey="Mrazek, J" uniqKey="Mrazek J">J Mrazek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Song, Hd" uniqKey="Song H">HD Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, L" uniqKey="Liu L">L Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Boheemen, S" uniqKey="Van Boheemen S">S van Boheemen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Ns" uniqKey="Wang N">NS Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, Wj" uniqKey="Chen W">WJ Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ge, Xy" uniqKey="Ge X">XY Ge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jin, L" uniqKey="Jin L">L Jin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Drake, Jw" uniqKey="Drake J">JW Drake</name>
</author>
<author>
<name sortKey="Holland, Jj" uniqKey="Holland J">JJ Holland</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, Gw" uniqKey="Chen G">GW Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Manz, B" uniqKey="Manz B">B Manz</name>
</author>
<author>
<name sortKey="Brunotte, L" uniqKey="Brunotte L">L Brunotte</name>
</author>
<author>
<name sortKey="Reuther, P" uniqKey="Reuther P">P Reuther</name>
</author>
<author>
<name sortKey="Schwemmle, M" uniqKey="Schwemmle M">M Schwemmle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Romero Tejeda, A" uniqKey="Romero Tejeda A">A Romero-Tejeda</name>
</author>
<author>
<name sortKey="Capua, I" uniqKey="Capua I">I Capua</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Woo, Pcy" uniqKey="Woo P">PCY Woo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vijgen, L" uniqKey="Vijgen L">L Vijgen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Burge, C" uniqKey="Burge C">C Burge</name>
</author>
<author>
<name sortKey="Campbell, Am" uniqKey="Campbell A">AM Campbell</name>
</author>
<author>
<name sortKey="Karlin, S" uniqKey="Karlin S">S Karlin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mclachlan, Gj" uniqKey="Mclachlan G">GJ McLachlan</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Sci Rep</journal-id>
<journal-id journal-id-type="iso-abbrev">Sci Rep</journal-id>
<journal-title-group>
<journal-title>Scientific Reports</journal-title>
</journal-title-group>
<issn pub-type="epub">2045-2322</issn>
<publisher>
<publisher-name>Nature Publishing Group UK</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26607834</article-id>
<article-id pub-id-type="pmc">4660426</article-id>
<article-id pub-id-type="publisher-id">BFsrep17155</article-id>
<article-id pub-id-type="doi">10.1038/srep17155</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Tang</surname>
<given-names>Qin</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Song</surname>
<given-names>Yulong</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Shi</surname>
<given-names>Mijuan</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cheng</surname>
<given-names>Yingyin</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Wanting</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Xia</surname>
<given-names>Xiao-Qin</given-names>
</name>
<address>
<email>xqxia@ihb.ac.cn</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
<institution-wrap>
<institution-id institution-id-type="GRID">grid.429211.d</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1792 6029</institution-id>
<institution>Center for Molecular and Cellular Biology of Aquatic Organisms, Institute of Hydrobiology, the Chinese Academy of Sciences,</institution>
</institution-wrap>
Wuhan, 430072 China</aff>
<aff id="Aff2">
<label>2</label>
<institution-wrap>
<institution-id institution-id-type="GRID">grid.410726.6</institution-id>
<institution-id institution-id-type="ISNI">0000 0004 1797 8419</institution-id>
<institution>University of Chinese Academy of Sciences,</institution>
</institution-wrap>
Beijing, China</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>26</day>
<month>11</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>26</day>
<month>11</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>5</volume>
<elocation-id>17155</elocation-id>
<history>
<date date-type="received">
<day>11</day>
<month>1</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>10</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2015</copyright-statement>
<license license-type="OpenAccess">
<license-p>This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
</license-p>
</license>
</permissions>
<abstract id="Abs1">
<p>Many coronaviruses are capable of interspecies transmission. Some of them have caused worldwide panic as emerging human pathogens in recent years, e.g., severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). In order to assess their threat to humans, we explored to infer the potential hosts of coronaviruses using a dual-model approach based on nineteen parameters computed from spike genes of coronaviruses. Both the support vector machine (SVM) model and the Mahalanobis distance (MD) discriminant model achieved high accuracies in leave-one-out cross-validation of training data consisting of 730 representative coronaviruses (99.86% and 98.08% respectively). Predictions on 47 additional coronaviruses precisely conformed to conclusions or speculations by other researchers. Our approach is implemented as a web server that can be accessed at
<ext-link ext-link-type="uri" xlink:href="http://bioinfo.ihb.ac.cn/seq2hosts">http://bioinfo.ihb.ac.cn/seq2hosts</ext-link>
.</p>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1038/srep17155) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group kwd-group-type="npg-subject">
<title>Subject terms</title>
<kwd>Viral evolution</kwd>
<kwd>Viral transmission</kwd>
</kwd-group>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2015</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Introduction</title>
<p>Emerging infectious diseases (EIDs) and their determinants have recently attracted substantial scientific and popular attention. Over 75% of EIDs consist of zoonosis
<sup>
<xref ref-type="bibr" rid="CR1">1</xref>
</sup>
. Among these pathogens are a group of viruses that belong to
<italic>Coronaviridae</italic>
.
<italic>Coronaviridae</italic>
is a family of enveloped, positive-sense, single-stranded RNA viruses that are usually characterized by an enveloped, spherical particle with a diameter in the range of 120–160 nm and a crown-like appearance
<sup>
<xref ref-type="bibr" rid="CR2">2</xref>
</sup>
. Coronaviruses usually cause respiratory tract infections, pneumonia, gastroenteritis, epidemic diarrhoea, enteric infections, hepatitis, encephalomyelitis and kidney failure. Their hosts include humans, porcines, bovines, murines, avians and other animals. In the past 12 years, two emerging infectious diseases—severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS)—attacked humans and animals worldwide and caused approximately 774 human deaths and 315 human deaths, respectively (
<ext-link ext-link-type="uri" xlink:href="http://www.who.int/csr/sars/country/table2004_04_21/en/">http://www.who.int/csr/sars/country/table2004_04_21/en/</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.who.int/csr/don/2014_07_23_mers/en/">http://www.who.int/csr/don/2014_07_23_mers/en/</ext-link>
). Especially MERS is still persistently bringing human infections and deaths in the outbreak in Korea recently (http://www.who.int/csr/don/19-june-2015-mers-korea/en/). These diseases, which are spread by respiratory means, caused significant panic around the world.</p>
<p>Coronaviruses are currently classified into four major genera or groups: the alpha-coronavirus, the beta-coronavirus, the gamma-coronavirus and the delta-coronavirus
<sup>
<xref ref-type="bibr" rid="CR1">1</xref>
,
<xref ref-type="bibr" rid="CR2">2</xref>
,
<xref ref-type="bibr" rid="CR3">3</xref>
,
<xref ref-type="bibr" rid="CR4">4</xref>
</sup>
. Alpha-coronavirus and beta-coronavirus usually infect mammalians, whereas gamma-coronavirus and delta-coronavirus usually infect birds
<sup>
<xref ref-type="bibr" rid="CR5">5</xref>
</sup>
. Among all proteins encoded by the coronavirus, the spike protein on the virion surface is the most critical protein, as it mediates both cell attachment and membrane fusion; a few nucleotide changes on the spike gene can cause interspecies transmission
<sup>
<xref ref-type="bibr" rid="CR6">6</xref>
</sup>
. The spike protein primarily consists of three segments, i.e., an ectodomain, a transmembrane anchor and a short intracellular tail. The ectodomain has two subunits for invading hosts: S1 is responsible for binding receptors and S2 is responsible for membrane fusion
<sup>
<xref ref-type="bibr" rid="CR7">7</xref>
</sup>
. A receptor-binding domain (RBD) near the C-terminal of S1 is primarily responsible for receptor recognition. Coronaviruses recognize a variety of molecules as receptors, including proteins, sugars and heparan sulfates on surfaces of host cells
<sup>
<xref ref-type="bibr" rid="CR8">8</xref>
</sup>
. As the spike gene mediates host recognition and invasion, its sequence must encode the information related to specific hosts; therefore, it is especially useful in identifying hosts of given coronaviruses.</p>
<p>As the result of natural selection and evolution, different genomes are characterized with different preferences for nucleotides. According to probability principles, a shorter nucleotide fragment has a lower chance of variation due to evolution and the copies of this fragment in a genome tend not to change significantly. This phenomenon is helpful for evolutionary analysis. Dinucleotides are the most stable of these fragments because they are the shortest and their bias values are usually diverse among species and they are highly invariant for a given individual genome
<sup>
<xref ref-type="bibr" rid="CR9">9</xref>
</sup>
. Dinucleotide abundance has been proven to be reliable in the identification and classification of sequences from viral genomes
<sup>
<xref ref-type="bibr" rid="CR10">10</xref>
,
<xref ref-type="bibr" rid="CR11">11</xref>
,
<xref ref-type="bibr" rid="CR12">12</xref>
,
<xref ref-type="bibr" rid="CR13">13</xref>
,
<xref ref-type="bibr" rid="CR14">14</xref>
,
<xref ref-type="bibr" rid="CR15">15</xref>
,
<xref ref-type="bibr" rid="CR16">16</xref>
,
<xref ref-type="bibr" rid="CR17">17</xref>
</sup>
.</p>
<p>Support vector machines (SVMs) are a group of supervised machine learning methods that were originally introduced by Vapnik as a linear classifier
<sup>
<xref ref-type="bibr" rid="CR18">18</xref>
</sup>
. Their current standard incarnation (soft margin) comprises associated learning algorithms for classification and regression analysis
<sup>
<xref ref-type="bibr" rid="CR19">19</xref>
</sup>
. The basic principle of class separation for a SVM is mapping vectors into a high-dimensional feature space and finding an optimal separating hyperplane between the two classes in this space by maximizing the margin between the classes’ closest points. The points on the boundaries are referred to as support vectors and the middle of the margin is the optimal separating hyperplane, which forms the largest gap between two sets of data
<sup>
<xref ref-type="bibr" rid="CR20">20</xref>
</sup>
. Based on this gap, the points of different attributes fall into different classes. Several types of algorithms exist for a SVM to address classification problems for multiple classes and high-dimension data. SVMs perform well in multiple areas of biological analysis, including the evaluation of microarray expression data, the detection of remote protein homologies and the recognition of translation initiation sites
<sup>
<xref ref-type="bibr" rid="CR21">21</xref>
</sup>
. Instances in which the established classification is questionable or wrong can be identified if an SVM is used for prediction of training samples.</p>
<p>Mahalanobis distance (MD) discrimination is a classical and accurate method that is extensively applied in cluster analysis and classification techniques
<sup>
<xref ref-type="bibr" rid="CR22">22</xref>
</sup>
. MD measures the distance between a point and a population and considers the variance of the population distribution; the points are sorted to the closest population in distance. Another method—Fisher’s linear discriminant analysis—has been applied to infer hosts for three novel Picorna-like viruses
<sup>
<xref ref-type="bibr" rid="CR23">23</xref>
</sup>
. As it requires data that have a normal distribution, which is not the case for our data, MD is adopted in this study.</p>
<p>Previous studies of coronaviruses were primarily focused on the evolution of genomes or specific genes, serum-neutralization assays for identification of receptors and crystal structure analysis of spike protein and receptor binding domains. In this study, we analysed the compositions of mononucleotides and dinucleotides in coronavirus spike genes. Based on the data matrix of nucleotide composition, the MD and SVM were applied to predict hosts of coronaviruses. The results of this technique may provide hints regarding natural hosts or potential hosts of the virus and can be used to guide the selection of the cells for virus isolation or to explore the probability of interspecies transmission of coronaviruses.</p>
</sec>
<sec id="Sec2">
<title>Results</title>
<sec id="Sec3">
<title>Nucleotide composition analysis</title>
<p>Nineteen parameters, including three mononucleotide frequencies (G, C and T) and 16 dinucleotide biases, were computed from 777 spike gene sequences (see
<xref rid="MOESM1" ref-type="media">Supplementary Table S1</xref>
). All parameters show significant differences across the host groups (Kruskal-Wallis tests,
<italic>p</italic>
< 2.2e–16); therefore, they were subsequently employed as factors in statistical models for discriminant analyses. Empirically, a dinucleotide relative abundance or dinucleotide bias (e.g.,
<inline-formula id="IEq1">
<inline-graphic xlink:href="41598_2015_Article_BFsrep17155_IEq1_HTML.gif"></inline-graphic>
</inline-formula>
) is significantly high if
<inline-formula id="IEq2">
<inline-graphic xlink:href="41598_2015_Article_BFsrep17155_IEq2_HTML.gif"></inline-graphic>
</inline-formula>
or extremely low if
<inline-formula id="IEq3">
<inline-graphic xlink:href="41598_2015_Article_BFsrep17155_IEq3_HTML.gif"></inline-graphic>
</inline-formula>
<sup>
<xref ref-type="bibr" rid="CR24">24</xref>
</sup>
. Among the 16 dinucleotides in this study, the CpA and TpG show an average abundance that is significantly higher than the expected values (
<inline-formula id="IEq4">
<inline-graphic xlink:href="41598_2015_Article_BFsrep17155_IEq4_HTML.gif"></inline-graphic>
</inline-formula>
 = 1.29,
<inline-formula id="IEq5">
<inline-graphic xlink:href="41598_2015_Article_BFsrep17155_IEq5_HTML.gif"></inline-graphic>
</inline-formula>
 = 1.28), whereas the average bias of CpG is extremely low (
<inline-formula id="IEq6">
<inline-graphic xlink:href="41598_2015_Article_BFsrep17155_IEq6_HTML.gif"></inline-graphic>
</inline-formula>
 = 0.44). This result indicates that the observed abundances of CpA and TpG are significantly higher than their expected values, and the observed abundance of CpG is significantly lower than the expected value. The G+C content is minimal (31–47%). This finding indicates that coronaviruses exhibit a low density of nucleotide sequences and may be sensitive to heat or alkali. The low G+C content also indicates a preference for codons ending with A or T and a higher mutability.</p>
</sec>
<sec id="Sec4">
<title>Training and validation of statistical models</title>
<p>The data matrix with 19 factors as columns and 730 samples as rows was fitted to SVM and MD models, all predictions in leave-one-out cross-validations were listed in
<xref rid="MOESM2" ref-type="media">Supplementary Table S2</xref>
and summarized in
<xref rid="Tab1" ref-type="table">Table 1</xref>
according to host species. The validations indicate that both models achieved high accuracies on the training data set: 99.86% for the SVM and 98.08% for the MD. All incorrect cases in unsupervised predictions are listed in
<xref rid="Tab2" ref-type="table">Table 2</xref>
. The only incorrect prediction by the SVM is sample NC_016996.1, which is isolated from an avian species but was predicted to infect humans. Among all 14 incorrect predictions by MD, bats are the common predicted hosts. No sample was incorrectly predicted by both models.
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>Summary of the hosts predicted for the 730 samples by MD in leave-one-out cross-validation.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Host species</th>
<th align="center">Viruses isolated (A)</th>
<th align="center">Other viruses (B = 730 − A)</th>
<th align="center">Total predictions (C)</th>
<th align="center">Infectivity probability (P = C/730)</th>
<th align="center">Predictions in others (D = C – A)</th>
<th align="center">Percentage in others (E = D/B × 100%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>avian</td>
<td>173</td>
<td>557</td>
<td>377</td>
<td>0.5164</td>
<td>204</td>
<td>36.62%</td>
</tr>
<tr>
<td>bat</td>
<td>74</td>
<td>656</td>
<td>494</td>
<td>0.6767</td>
<td>420</td>
<td>64.02%</td>
</tr>
<tr>
<td>bovine</td>
<td>77</td>
<td>653</td>
<td>77</td>
<td>0.1055</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>human</td>
<td>196</td>
<td>534</td>
<td>202</td>
<td>0.2767</td>
<td>6</td>
<td>1.12%</td>
</tr>
<tr>
<td>murine</td>
<td>28</td>
<td>702</td>
<td>28</td>
<td>0.0384</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>porcine</td>
<td>182</td>
<td>548</td>
<td>185</td>
<td>0.2534</td>
<td>3</td>
<td>0.55%</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab2">
<label>Table 2</label>
<caption>
<p>The incorrect predictions of MD and SVM in leave-one-out cross-validation.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>NCBI Access No.</th>
<th align="center">Virus sources</th>
<th align="center">Wrong predictions by MD</th>
<th align="center">Wrong predictions by SVM</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">AB008940.1; AB551247.1; AF190406.1;AF201929.1; AF208066.1; FJ647223.1;FJ647224.1;FJ938068.1;JF792616.1</td>
<td>murine</td>
<td>9 (bat)</td>
<td></td>
</tr>
<tr>
<td align="center">NC_011549.1; NC_011550.1; NC_016993.1;NC_016994.1; NC_016995.1</td>
<td>avian</td>
<td>5 (bat)</td>
<td></td>
</tr>
<tr>
<td align="center">NC_016996.1</td>
<td>avian</td>
<td></td>
<td>1 (human)</td>
</tr>
<tr>
<td align="center">In total</td>
<td></td>
<td>14</td>
<td>1</td>
</tr>
<tr>
<td align="center">Accuracy rate</td>
<td></td>
<td>(730–14)/730 = 98.08%</td>
<td>(730–1)/730 = 99.86%</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec id="Sec5">
<title>Predictions for viruses capable of interspecies transmission</title>
<p>The trained models were applied to 47 additional samples and the predictions unveiled clues regarding potential interspecies transmission (See
<xref rid="Tab3" ref-type="table">Table 3</xref>
). Sequences 1–31 comprise spike genes of coronaviruses that were primarily isolated from palm civets from restaurants, animal markets, or farms in southern China when SARS wreaked havoc in 2003. The sequences of these coronaviruses (civet-CoVs) are similar not only to each other but also to SARS-CoV. Cross-host evolution research of SARS-CoV in palm civet and humans indicated that the variations in spike genes seemed to be essential for the transition of coronavirus from animal-to-human transmission to human-to-human transmission
<sup>
<xref ref-type="bibr" rid="CR25">25</xref>
</sup>
. In addition to cross-neutralization with SARS-CoV, these SARS-like civet-CoVs can use human ACE2 as an entry receptor
<sup>
<xref ref-type="bibr" rid="CR26">26</xref>
</sup>
. Bats are the reservoir hosts of a number of coronaviruses and a recent study also suggests that bats are natural reservoirs of these SARS-like coronaviruses, whereas palm civets and humans are intermediate hosts
<sup>
<xref ref-type="bibr" rid="CR1">1</xref>
</sup>
. All hosts predicted by the SVM are humans, which supports the previously mentioned research. The MD identified both bats and humans as hosts of these samples, but bats are the preferable hosts for samples 1–26 and the second choice for samples 27–31. This finding is also expected as bats are considered to be natural hosts of these viruses.
<table-wrap id="Tab3">
<label>Table 3</label>
<caption>
<p>The isolate sources and predicted hosts of 47 coronaviruses.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Serial number</th>
<th align="center">Test sample AccNum</th>
<th align="center">SVM prediction</th>
<th align="center">MD prediction</th>
<th align="center">Isolate source</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>AY572034.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>2</td>
<td>AY572036.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>3</td>
<td>AY572037.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>4</td>
<td>AY687355.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>5</td>
<td>AY687356.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>6</td>
<td>AY687358.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>raccoon dog</td>
</tr>
<tr>
<td>7</td>
<td>AY687359.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>8</td>
<td>AY687360.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>9</td>
<td>AY687361.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>10</td>
<td>AY687362.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>11</td>
<td>AY687363.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>12</td>
<td>AY687365.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>13</td>
<td>AY687367.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>14</td>
<td>AY687368.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>15</td>
<td>AY687369.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>16</td>
<td>AY687370.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>17</td>
<td>AY687371.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>18</td>
<td>AY687372.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>19</td>
<td>AY627044.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>20</td>
<td>AY627045.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>21</td>
<td>AY627046.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>22</td>
<td>AY627047.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>23</td>
<td>AY627048.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>24</td>
<td>AY613952.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>25</td>
<td>AY613951.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>26</td>
<td>AY525636.1</td>
<td>human
<sup>*</sup>
</td>
<td>bat
<sup>**</sup>
, human
<sup>*</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>27</td>
<td>DQ514528.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>28</td>
<td>DQ514529.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>29</td>
<td>DQ514530.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>30</td>
<td>DQ514531.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>31</td>
<td>DQ514532.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>palm civet</td>
</tr>
<tr>
<td>32</td>
<td>KJ477102.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>*</sup>
</td>
<td>dromedary</td>
</tr>
<tr>
<td>33</td>
<td>KJ650098.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>dromedary</td>
</tr>
<tr>
<td>34</td>
<td>KJ650295.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>dromedary</td>
</tr>
<tr>
<td>35</td>
<td>KJ713295.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>dromedary</td>
</tr>
<tr>
<td>36</td>
<td>KJ713296.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>dromedary</td>
</tr>
<tr>
<td>37</td>
<td>KJ713297.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>dromedary</td>
</tr>
<tr>
<td>38</td>
<td>KJ713298.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>dromedary</td>
</tr>
<tr>
<td>39</td>
<td>KJ713299.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>dromedary</td>
</tr>
<tr>
<td>40</td>
<td>KF917527.1</td>
<td>human
<sup>*</sup>
</td>
<td>human
<sup>**</sup>
, bat
<sup>**</sup>
</td>
<td>dromedary</td>
</tr>
<tr>
<td>41</td>
<td>AY654624.1</td>
<td>human
<sup>*</sup>
, bat, avian, porcine</td>
<td>human
<sup>**</sup>
, bat
<sup>*</sup>
, avian, porcine</td>
<td>porcine</td>
</tr>
<tr>
<td>42</td>
<td>KC881005.1</td>
<td>bat</td>
<td>bat
<sup>**</sup>
, avian
<sup>*</sup>
</td>
<td>bat</td>
</tr>
<tr>
<td>43</td>
<td>KC881006.1</td>
<td>human, bat</td>
<td>bat
<sup>**</sup>
</td>
<td>bat</td>
</tr>
<tr>
<td>44</td>
<td>KC881007.1</td>
<td>human, bat</td>
<td>bat
<sup>**</sup>
</td>
<td>bat</td>
</tr>
<tr>
<td>45</td>
<td>DQ915164.2</td>
<td>bovine
<sup>*</sup>
</td>
<td>bovine
<sup>**</sup>
, avian
<sup>*</sup>
, bat
<sup>*</sup>
</td>
<td>alpaca</td>
</tr>
<tr>
<td>46</td>
<td>FJ415324.1</td>
<td>bovine
<sup>*</sup>
, human</td>
<td>bovine
<sup>**</sup>
, avian
<sup>*</sup>
, bat, human</td>
<td>human</td>
</tr>
<tr>
<td>47</td>
<td>FJ938067.1</td>
<td>bovine
<sup>*</sup>
, human</td>
<td>bovine
<sup>**</sup>
, avian
<sup>*</sup>
, bat, human</td>
<td>human, bovine</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Predictions consist of hosts with minimal MD or
<italic>p</italic>
values, those with MD <= 200 or
<italic>p</italic>
 <= 0.05 for SVM, and those with MD or
<italic>p</italic>
values no greater than corresponding values of isolate sources if the isolate sources are among the six categories of hosts. All predictions are listed in ascending of MD or
<italic>p</italic>
values.
<sup>*</sup>
<italic>p</italic>
 <= 0.05 or MD <= 200.
<sup>**</sup>
<italic>p</italic>
 <= 0.01 or MD <= 100.</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>Sequences 32–40 comprise spike genes of MERS-CoVs from dromedaries after the outbreak in the Middle East in 2012. MERS-CoVs are similar to the bat coronaviruses HKU5 and HKU4 in their amino acid sequences
<sup>
<xref ref-type="bibr" rid="CR27">27</xref>
</sup>
, and they can use human DPP4 as an entry receptor
<sup>
<xref ref-type="bibr" rid="CR28">28</xref>
</sup>
. MERS-CoVs was assumed to originate from HKU5 in pipistrelle, which is a type of Japanese bat
<sup>
<xref ref-type="bibr" rid="CR3">3</xref>
</sup>
. In our study, these MERS-CoVs isolated from camels were predicted to be capable of infecting humans; and bats are also likely hosts next to humans in predictions by MD. This result is obviously consistent with above speculations and also supports the WHO advices about avoiding close contact with camels (
<ext-link ext-link-type="uri" xlink:href="http://www.who.int/csr/don/2014_07_23_mers/en/">http://www.who.int/csr/don/2014_07_23_mers/en/</ext-link>
).</p>
<p>The 41st sample was a SARS-associated coronavirus that was transmitted from human to pig
<sup>
<xref ref-type="bibr" rid="CR29">29</xref>
</sup>
and both SVM and MD detected its threat to humans. Bat and avian might be potential hosts since both models suggest that they are more vulnerable than porcine. Samples 42–44 (RsSHC014, Rs3367 and SL-CoV-WIV1) consist of three SARS-like coronaviruses from bats
<sup>
<xref ref-type="bibr" rid="CR30">30</xref>
</sup>
. Analyses based on the sequence similarities and cultures in the cell lines suggest that Rs3367 and SL-CoV-WIV1 are capable of using a SARS-CoV receptor for cell entry and pose a threat to humans, whereas RsSHC014 cannot
<sup>
<xref ref-type="bibr" rid="CR30">30</xref>
</sup>
. Our study provides a precise support to these conclusions. The MD correctly predicts bats as the natural hosts of the three viruses and the SVM indicates that Rs3367 and SL-CoV-WIV1 are harmful to humans.</p>
<p>The 45th sample was isolated from an alpaca by Jin
<italic>et al.</italic>
in 2007 with a serotype of bovine; the phylogenetic analysis suggests that it shares the same ancestor with bovine-coronaviruses
<sup>
<xref ref-type="bibr" rid="CR31">31</xref>
</sup>
. Our analysis supports the finding that this coronavirus is capable of infecting bovine. These analyses imply that this strain is capable of interspecies transmission between bovines and alpacas. Samples 46 and 47 are enteric coronaviruses from bovines and humans; they have been identified as the same strain named “Human enteric coronavirus 4408” in the NCBI database due to the similarity between their spike protein sequences of 99.9%. Although they are similar to the human coronavirus OC43 and the bovine coronavirus, evidences from morphological, immunological and genomic studies indicate that they are closer to bovine coronavirus than to human coronavirus (unpublished research, from personal communication). This finding is consistent with our analysis. In addition, avian and bat are worthy of attentions as potential hosts due to the small MD values.</p>
</sec>
<sec id="Sec6">
<title>Tendencies of MD and SVM in predictions</title>
<p>Two groups of two-dimensional data are plotted in
<xref rid="Fig1" ref-type="fig">Fig. 1</xref>
. The blue points represent a “loose” population with a larger standard deviation (SD) of
<italic>N</italic>
(1, 1) and the red points represent a “tight” population with a smaller SD of
<italic>N</italic>
(3.5, 0.5). The red line separates the two groups classified by the MD and the groups predicted by the SVM are delimited by the blue line. In this figure, two individuals (the red triangles between the two lines) from the “tight” population were classified into the “loose” group by the MD, whereas the SVM accidentally excluded four points (the blue reversed triangles between the two lines) from the “loose” population. This example shows that MD and SVM have inverse tendencies in some cases, i.e., when a “loose” population is close to a “tight” population, MD intends to classify outliers of the “tight” population into the former. The opposite situation is valid for the SVM.
<fig id="Fig1">
<label>Figure 1</label>
<caption>
<p>Tendencies of MD and SVM models.</p>
</caption>
<graphic xlink:href="41598_2015_Article_BFsrep17155_Fig1_HTML" id="d29e1892"></graphic>
</fig>
</p>
</sec>
</sec>
<sec id="Sec7">
<title>Discussion</title>
<p>Nucleotide composition analysis revealed the overrepresentation of CpA and TpG dinucleotides and the suppression of CpG dinucleotides (see
<xref rid="MOESM1" ref-type="media">Supplementary Table S1</xref>
), which indicates that coronaviruses generally prefer motifs that contain CpAs and TpGs and avoid CpGs in sequences. These dinucleotide biases are common characteristics of RNA viruses in vertebrates
<sup>
<xref ref-type="bibr" rid="CR11">11</xref>
,
<xref ref-type="bibr" rid="CR12">12</xref>
,
<xref ref-type="bibr" rid="CR15">15</xref>
,
<xref ref-type="bibr" rid="CR16">16</xref>
</sup>
. As most vertebrates exhibit a very low CpG representation in genomes, RNA viruses may gradually adapt to the accumulation of host mutations and mimic the host gene’s dinucleotide patterns for survival
<sup>
<xref ref-type="bibr" rid="CR11">11</xref>
</sup>
. For DNA viruses, the most-accepted mechanisms for the suppression of CpG dinucleotides are the methylation of CpG nucleotides and the subsequent deamination of 5-methylcytosine, which renders CpG a mutational hotspot
<sup>
<xref ref-type="bibr" rid="CR24">24</xref>
</sup>
. For RNA viruses, a different hypothesis is that the RNA viruses encounter different selection pressures when they switch to a new host, and viral RNA genes mimic host mRNAs to avoid immune detection
<sup>
<xref ref-type="bibr" rid="CR11">11</xref>
</sup>
. Similar to other human ssRNA viruses, coronaviruses show a strong correlation between CpG pressure and C+G content (Pearson’s correlation coefficient, r = 0.5443,
<italic>p</italic>
 < 2.2e-16, our data). A lower C+G content usually indicates that the nucleotide sequence of the virus is unstable or is highly variable under evolutionary selection pressure. Considering that the mutation rates for RNA viruses are significantly higher than the mutation rates for DNA viruses
<sup>
<xref ref-type="bibr" rid="CR32">32</xref>
</sup>
, mutational pressure may be the most important determinant of the bias in codon usage in human RNA viruses, such as coronaviruses
<sup>
<xref ref-type="bibr" rid="CR14">14</xref>
</sup>
.</p>
<p>The capabilities to bind with receptors and to replicate in host cells are essential for any virus to infect hosts. Different genes contribute to these biological processes. Variations on these genes may enable a virus to transmit cross-species. One famous example would be the polymerase 2 (PB2) of influenza A virus, in which amino acid change from E to K at its 627th position would render the virus to replicate in mammalian cells
<sup>
<xref ref-type="bibr" rid="CR33">33</xref>
,
<xref ref-type="bibr" rid="CR34">34</xref>
,
<xref ref-type="bibr" rid="CR35">35</xref>
</sup>
. In coronaviruses, the spike protein is functionally associated with recognition of hosts and the RNA-dependent RNA polymerase (RdRp) is related to proliferation of virus. However, there are two obstacles limiting the use of RdRp gene: (1) The similarities among nucleotide sequences is too high to train MD model, i.e., the variation rate of RdRp sequence is slower and cannot provide enough resolution to discriminate different coronaviruses; (2) Even worse, available full-length CDSs in public databases are very limited — only 23 or so. On the contrary, the spike gene perfectly satisfied the requirements for variation rate and availability, therefore was adopted as markers in this study.</p>
<p>MD and SVM show opposite tendencies in judging outliers (See
<xref rid="Fig1" ref-type="fig">Fig. 1</xref>
), which reflects the different principles of the two classification approaches. Unlike the Euclidian distance (ED), which measures the absolute distance between points or mass centres in space, the Mahalanobis distance considers the variances within a population and the covariance between variables. In some cases, especially when a population with individuals who are scattered across a wide range is located close to a “tight” population with smaller internal variations, the MD may classify marginal individuals from the latter into the “loose” population even if they are “close” to a “tight” population according to the ED. The MD enables “loose” populations to have a greater number of points. The SVM has a different philosophy. SVM separates populations by finding a hyperplane that maximizes the distances between populations. When a “loose” population is close to the boundary of a “tight” population, SVM is more likely to find this hyperplane within the former. This finding explains SVM’s tendency to exclude outliers from a “loose” population.</p>
<p>Bats are the reservoir hosts of a number of coronaviruses that can survive in bats and accumulate variations in the long evolutionary process
<sup>
<xref ref-type="bibr" rid="CR1">1</xref>
,
<xref ref-type="bibr" rid="CR36">36</xref>
,
<xref ref-type="bibr" rid="CR37">37</xref>
</sup>
. Thus, coronaviruses in bats constitute a “loose” population with larger internal gaps. We assume that some strains of viruses in bats gain sufficient variation to enable them to infect other organisms; these viruses form a new “tight” population at the edge of the original group. In this case, the MD emphasizes the connection of a virus with the original source, whereas the SVM may be more sensitive to the possibility of infecting new hosts. Therefore, the incorporation of analyses using the MD and SVM can be especially helpful for revealing the profile of interspecies transmission.</p>
<p>According to the predictions by MD, bats are not only the hosts in all 14 incorrect cases from training data set (See
<xref rid="Tab2" ref-type="table">Table 2</xref>
), but also in the host list of each coronaviruses for testing (See
<xref rid="Tab3" ref-type="table">Table 3</xref>
). Furthermore, bats were predicted to host of 64.02% training samples isolated from other hosts (See
<xref rid="Tab1" ref-type="table">Table 1</xref>
). These facts convincingly support the notion that these viruses originated from bats and shifted to other hosts.</p>
<p>Next to bats, avians could be infected by 36.64% samples from other hosts. If bats are the only reservoir hosts and coronaviruses spread from bats to avians and other animals, according to the stochastic event model, the probability of co-infectivity to both bat and avian can be the product of the infectivity probabilities to each of them, i.e., 0.3494 (0.5164 × 0.6767, see
<xref rid="Tab1" ref-type="table">Table 1</xref>
), then 255 (0.3494 × 730) samples are expected to be of co-infectivity. However, only 173 samples were predicted to be of co-infectivity to bats and avians. So avians might be the second independent source of coronavirus in parallel to bats. If this speculation is true, people will have to maintain vigilance to avian coronaviruses apart from avian influenza viruses. Especially, due to the high accuracy of the SVM in cross-validation, we should seriously consider its only “wrong” prediction: perhaps it is sensible to investigate whether the NC_016996.1 virus from avian is capable of infecting humans.</p>
<p>For the viruses that are capable of spreading across a host species barrier, the combination of the MD and the SVM is valuable for assessing their potential threat. The origin and interspecies transmission of coronaviruses have been extensively discussed in the past ten years and the coronaviruses of most mammals are believed to originate from their ancestors in bats
<sup>
<xref ref-type="bibr" rid="CR1">1</xref>
,
<xref ref-type="bibr" rid="CR36">36</xref>
,
<xref ref-type="bibr" rid="CR37">37</xref>
</sup>
. Our analysis with dual statistical models support the finding that SARS-CoVs and MERS-CoVs spread from bats to humans and other animals. In most cases, our approach provided convincing predictions. The dual-model approach can be expected to become a useful tool in future studies. Typically, when a novel coronavirus is isolated, the combination of the MD and the SVM may provide meaningful hints regarding its origin and potential threat to humans or other animals. As soon as more virus genomes are sequenced, this approach can be applied to investigate the interspecies transmission route of other threatening viruses, including the recent Ebola outbreak in West Africa.</p>
</sec>
<sec id="Sec8">
<title>Methods</title>
<sec id="Sec9">
<title>Data preparation</title>
<p>All genome sequences and complete coding sequences (CDSs) of spike genes were downloaded from the National Centre for Biotechnology Information (NCBI) database (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/">http://www.ncbi.nlm.nih.gov/</ext-link>
) on July 17, 2014. Sequences of spike genes were extracted from the 1044 coronavirus genomes and pooled with 1380 downloaded CDSs. Then, we removed replicate sequences and sequences that contained non-standard bases or were incapable of coding complete products. The length of each sequence is longer than 3,000 bases. Among all 777 valid nucleotide sequences that are listed in
<xref rid="MOESM3" ref-type="media">Supplementary Data S1</xref>
, 730 sequences fall into six categories according to different hosts: 196 for humans, 182 for porcines, 77 for bovines, 74 for bats, 28 for murines and 173 for avians. The majority of the remaining 47 viruses were isolated from the two epidemic diseases caused by the coronavirus in the past 12 years. Although we only listed the hosts from which they were isolated, these viruses have been verified or suspected to have the ability to infect different hosts; thus, all 47 sequences were employed to explore interspecies transmission of coronaviruses. Viruses from other mammals, including canines, felines, rabbits, equines, alpacas and whales, were excluded from the data set as the number of spike sequences for each host is insufficient for establishing a separate group.</p>
</sec>
<sec id="Sec10">
<title>Nucleotide composition analysis</title>
<p>The mononucleotide frequencies and dinucleotide biases of the spike sequences were computed using our original Python scripts. Dinucleotide bias is the ratio of the observed value to the expected frequency of each of the 16 dinucleotides:
<inline-formula id="IEq7">
<inline-graphic xlink:href="41598_2015_Article_BFsrep17155_IEq7_HTML.gif"></inline-graphic>
</inline-formula>
, where
<inline-formula id="IEq8">
<inline-graphic xlink:href="41598_2015_Article_BFsrep17155_IEq8_HTML.gif"></inline-graphic>
</inline-formula>
is the dinucleotide bias,
<italic>f</italic>
<sub>
<italic>XY</italic>
</sub>
is the frequency of dinucleotide XY,
<italic>f</italic>
<sub>
<italic>X</italic>
</sub>
and
<italic>f</italic>
<sub>
<italic>Y</italic>
</sub>
are the frequencies of nucleotide X and nucleotide Y
<sup>
<xref ref-type="bibr" rid="CR38">38</xref>
</sup>
, respectively.</p>
<p>In this study, we considered 19 factors, including three mononucleotide frequencies (G, C and T) and 16 dinucleotide biases. As none of the frequencies has a normal distribution, the nonparametric “Kruskal-Wallis Test” was employed to investigate the difference in each factor among six categories. As a result, significant differences across categories were detected for each factor; thus, all 19 factors were employed for modelling.</p>
</sec>
<sec id="Sec11">
<title>Modelling, validation and prediction</title>
<p>As a classifier, the SVM can efficiently perform a nonlinear classification using a kernel technique that is rooted in structural risk minimization. In this study, the R package e1071 (Version: 1.6–3)
<sup>
<xref ref-type="bibr" rid="CR20">20</xref>
</sup>
was employed for the SVM analysis. “C-classification” was adopted as the model type and “Radial” was adopted as the SVM kernel in our analysis. The MD is a measure of the distance from a point to the centre of a distribution; the principle of this discriminant is that individuals belong to the closest group in the distance. The MD is defined as
<inline-formula id="IEq9">
<inline-graphic xlink:href="41598_2015_Article_BFsrep17155_IEq9_HTML.gif"></inline-graphic>
</inline-formula>
, where
<italic>X</italic>
denotes the population,
<italic>x</italic>
denotes the individual, μ is the mean value of the population,
<italic>T</italic>
denotes the matrix transpose and
<inline-formula id="IEq10">
<inline-graphic xlink:href="41598_2015_Article_BFsrep17155_IEq10_HTML.gif"></inline-graphic>
</inline-formula>
denotes the covariance matrix of population
<sup>
<xref ref-type="bibr" rid="CR39">39</xref>
</sup>
. The R program “distinguish.distance.R”
<sup>
<xref ref-type="bibr" rid="CR40">40</xref>
</sup>
was employed in the MD analysis. Leave-one-out cross-validation was employed for both SVM and MD analyses.</p>
<p>When the trained models are applied to a sequence for testing, each of the six categories of hosts will obtain a
<italic>p</italic>
value from SVM and a MD value. Based on
<italic>p</italic>
values and MD values, three steps will be taken to determine candidate hosts. First, the host of minimal
<italic>p</italic>
value or MD value is reasonably regarded as the preferable host. Then, two adjustable empirical thresholds can be used for each model to pick out other potential hosts. In this study, we adopted 0.05 and 0.01 for
<italic>p</italic>
value, 200 and 100 for MD value; i.e., likely hosts were determined if
<italic>p</italic>
 <= 0.05 or MD <= 200 and very likely hosts were defined by
<italic>p</italic>
 <= 0.01 or MD <= 100. The two steps are unsupervised prediction. In case that the isolate source is among the six host groups for modelling, a supervised prediction can be applied as the third step, i.e., all host species with
<italic>p</italic>
values or MD values no more than those of the observed host will be listed as potential hosts, which can be practical references for researchers to evaluate a virus’s threats to human or other animals.</p>
</sec>
<sec id="Sec12">
<title>Compare the tendencies of MD and SVM in predictions</title>
<p>Two groups of two-dimensional vectors were generated
<italic>in silico</italic>
as two populations. The number of vectors in the first population are randomly generated from the normal distribution
<italic>N</italic>
(1, 1) and the number of vectors in the second population are randomly generated from
<italic>N</italic>
(3.5, 0.5). As the first population has a larger standard deviation (SD), we refer to it as the “loose” population and refer to the second population as the “tight” population. The two groups of data are employed for the leave-one-out cross-validations of MD and SVM.</p>
<p>All Python and R scripts employed in this study are available from the authors upon request. The prediction can be performed using the spike gene sequences of the coronaviruses on our web server, which is available to the public at no cost at
<ext-link ext-link-type="uri" xlink:href="http://bioinfo.ihb.ac.cn/seq2hosts">http://bioinfo.ihb.ac.cn/seq2hosts</ext-link>
.</p>
</sec>
</sec>
<sec id="Sec13">
<title>Additional Information</title>
<p>
<bold>How to cite this article</bold>
: Tang, Q.
<italic>et al.</italic>
Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition.
<italic>Sci. Rep.</italic>
<bold>5</bold>
, 17155; doi: 10.1038/srep17155 (2015).</p>
</sec>
<sec sec-type="supplementary-material">
<title>Electronic supplementary material</title>
<sec id="Sec14">
<p>
<supplementary-material content-type="local-data" id="MOESM1">
<media xlink:href="41598_2015_BFsrep17155_MOESM1_ESM.xls">
<caption>
<p>Supplementary Table S1</p>
</caption>
</media>
</supplementary-material>
</p>
<p>
<supplementary-material content-type="local-data" id="MOESM2">
<media xlink:href="41598_2015_BFsrep17155_MOESM2_ESM.xls">
<caption>
<p>Supplementary Table S2</p>
</caption>
</media>
</supplementary-material>
</p>
<p>
<supplementary-material content-type="local-data" id="MOESM3">
<media xlink:href="41598_2015_BFsrep17155_MOESM3_ESM.pdf">
<caption>
<p>Supplementary Data S1</p>
</caption>
</media>
</supplementary-material>
</p>
</sec>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>This work was supported by the 100-talent program grant from the Chinese Academy of Sciences.</p>
</ack>
<notes notes-type="author-contribution">
<title>Author Contributions</title>
<p>Q.T. designed the research, collected and analyzed data and wrote the paper. Y.S., M.S., Y.C. and W.Z. analyzed data, X.-Q.X. designed the research, re-analyzed data, wrote the paper and created the web server. All authors reviewed the manuscript.</p>
</notes>
<notes notes-type="COI-statement">
<title>Competing interests</title>
<p>The authors declare no competing financial interests.</p>
</notes>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chan</surname>
<given-names>JFW</given-names>
</name>
<name>
<surname>To</surname>
<given-names>KKW</given-names>
</name>
<name>
<surname>Tse</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>DY</given-names>
</name>
<name>
<surname>Yuen</surname>
<given-names>KY</given-names>
</name>
</person-group>
<article-title> Interspecies transmission and emergence of novel viruses: lessons from bats and birds</article-title>
<source>Trends Microbiol.</source>
<year>2013</year>
<volume>21</volume>
<fpage>544</fpage>
<lpage>555</lpage>
<pub-id pub-id-type="doi">10.1016/j.tim.2013.05.005</pub-id>
<pub-id pub-id-type="pmid">23770275</pub-id>
</element-citation>
</ref>
<ref id="CR2">
<mixed-citation publication-type="other">King, A. M. Q., Adams, M. J., Carstens, E. B. & Lefkowitz, E. J. Virus taxonomy, the Ninth Report of the International Committee on Taxonomy of Viruses 810–814 (Academic Press, San Diego, CA., 2012).</mixed-citation>
</ref>
<ref id="CR3">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lau</surname>
<given-names>SKP</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genetic characterization of betacoronavirus lineage C viruses in bats reveals marked sequence divergence in the spike protein of Pipistrellus bat coronavirus HKU5 in Japanese Pipistrelle: Implications for the origin of the novel Middle East Respiratory Syndrome Coronavirus</article-title>
<source>J. Virol.</source>
<year>2013</year>
<volume>87</volume>
<fpage>8638</fpage>
<lpage>8650</lpage>
<pub-id pub-id-type="doi">10.1128/JVI.01055-13</pub-id>
<pub-id pub-id-type="pmid">23720729</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Graham</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Donaldson</surname>
<given-names>EF</given-names>
</name>
<name>
<surname>Baric</surname>
<given-names>RS</given-names>
</name>
</person-group>
<article-title>A decade after SARS: strategies for controlling emerging coronaviruses</article-title>
<source>Nat. Rev. Microbiol.</source>
<year>2013</year>
<volume>11</volume>
<fpage>836</fpage>
<lpage>848</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro3143</pub-id>
<pub-id pub-id-type="pmid">24217413</pub-id>
</element-citation>
</ref>
<ref id="CR5">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Woo</surname>
<given-names>PCY</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Lau</surname>
<given-names>SKP</given-names>
</name>
<name>
<surname>Yuen</surname>
<given-names>KY</given-names>
</name>
</person-group>
<article-title>Coronavirus genomics and bioinformatics analysis</article-title>
<source>Viruses-Basel.</source>
<year>2010</year>
<volume>2</volume>
<fpage>1804</fpage>
<lpage>1820</lpage>
<pub-id pub-id-type="doi">10.3390/v2081803</pub-id>
</element-citation>
</ref>
<ref id="CR6">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Farzan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Harrison</surname>
<given-names>SC</given-names>
</name>
</person-group>
<article-title>Structure of SARS coronavirus spike receptor-binding domain complexed with receptor</article-title>
<source>Science</source>
<year>2005</year>
<volume>309</volume>
<fpage>1864</fpage>
<lpage>1868</lpage>
<pub-id pub-id-type="doi">10.1126/science.1116480</pub-id>
<pub-id pub-id-type="pmid">16166518</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Receptor recognition and cross-species infections of SARS coronavirus</article-title>
<source>Antivir. Res.</source>
<year>2013</year>
<volume>100</volume>
<fpage>246</fpage>
<lpage>254</lpage>
<pub-id pub-id-type="doi">10.1016/j.antiviral.2013.08.014</pub-id>
<pub-id pub-id-type="pmid">23994189</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Perlman</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Netland</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Coronaviruses post-SARS: update on replication and pathogenesis</article-title>
<source>Nat. Rev. Microbiol.</source>
<year>2009</year>
<volume>7</volume>
<fpage>439</fpage>
<lpage>450</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro2147</pub-id>
<pub-id pub-id-type="pmid">19430490</pub-id>
</element-citation>
</ref>
<ref id="CR9">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lobo</surname>
<given-names>FP</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Virus-host coevolution: common patterns of nucleotide motif usage in Flaviviridae and their hosts</article-title>
<source>PLoS ONE</source>
<year>2009</year>
<volume>4</volume>
<fpage>1</fpage>
<lpage>14</lpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0006282</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dunham</surname>
<given-names>EJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Different evolutionary trajectories of European Avian-Like and classical Swine H1N1 influenza A viruses</article-title>
<source>J. Virol.</source>
<year>2009</year>
<volume>83</volume>
<fpage>5485</fpage>
<lpage>5494</lpage>
<pub-id pub-id-type="doi">10.1128/JVI.02565-08</pub-id>
<pub-id pub-id-type="pmid">19297491</pub-id>
</element-citation>
</ref>
<ref id="CR11">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Greenbaum</surname>
<given-names>BD</given-names>
</name>
<name>
<surname>Levine</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Bhanot</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Rabadan</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Patterns of evolution and host gene mimicry in influenza and other RNA viruses</article-title>
<source>PLoS Pathog.</source>
<year>2008</year>
<volume>4</volume>
<fpage>1</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1371/journal.ppat.1000079</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chantawannakul</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Cutler</surname>
<given-names>RW</given-names>
</name>
</person-group>
<article-title>Convergent host-parasite codon usage between honeybee and bee associated viral genomes</article-title>
<source>J. Invertebr. Pathol.</source>
<year>2008</year>
<volume>98</volume>
<fpage>206</fpage>
<lpage>210</lpage>
<pub-id pub-id-type="doi">10.1016/j.jip.2008.02.016</pub-id>
<pub-id pub-id-type="pmid">18397791</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shackelton</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Parrish</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>EC</given-names>
</name>
</person-group>
<article-title>Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses</article-title>
<source>J. Mol. Evol.</source>
<year>2006</year>
<volume>62</volume>
<fpage>551</fpage>
<lpage>563</lpage>
<pub-id pub-id-type="doi">10.1007/s00239-005-0221-1</pub-id>
<pub-id pub-id-type="pmid">16557338</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gu</surname>
<given-names>WJ</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>ZH</given-names>
</name>
</person-group>
<article-title>Analysis of synonymous codon usage in SARS Coronavirus and other viruses in the Nidovirales</article-title>
<source>Virus Res.</source>
<year>2004</year>
<volume>101</volume>
<fpage>155</fpage>
<lpage>161</lpage>
<pub-id pub-id-type="doi">10.1016/j.virusres.2004.01.006</pub-id>
<pub-id pub-id-type="pmid">15041183</pub-id>
</element-citation>
</ref>
<ref id="CR15">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berkhout</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Grigoriev</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bakker</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lukashov</surname>
<given-names>VV</given-names>
</name>
</person-group>
<article-title>Codon and amino acid usage in retroviral genomes is consistent with virus-specific nucleotide pressure</article-title>
<source>Aids Res. Hum. Retrov.</source>
<year>2002</year>
<volume>18</volume>
<fpage>133</fpage>
<lpage>141</lpage>
<pub-id pub-id-type="doi">10.1089/08892220252779674</pub-id>
</element-citation>
</ref>
<ref id="CR16">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jenkins</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Pagel</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gould</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Zanotto</surname>
<given-names>PMD</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>EC</given-names>
</name>
</person-group>
<article-title>Evolution of base composition and codon usage bias in the genus Flavivirus</article-title>
<source>J. Mol. Evol.</source>
<year>2001</year>
<volume>52</volume>
<fpage>383</fpage>
<lpage>390</lpage>
<pub-id pub-id-type="doi">10.1007/s002390010168</pub-id>
<pub-id pub-id-type="pmid">11343134</pub-id>
</element-citation>
</ref>
<ref id="CR17">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rima</surname>
<given-names>BK</given-names>
</name>
<name>
<surname>McFerran</surname>
<given-names>NV</given-names>
</name>
</person-group>
<article-title>Dinucleotide and stop codon frequencies in single-stranded RNA viruses</article-title>
<source>J. Gen. Virol.</source>
<year>1997</year>
<volume>78</volume>
<fpage>2859</fpage>
<lpage>2870</lpage>
<pub-id pub-id-type="doi">10.1099/0022-1317-78-11-2859</pub-id>
<pub-id pub-id-type="pmid">9367373</pub-id>
</element-citation>
</ref>
<ref id="CR18">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vapnik</surname>
<given-names>VN</given-names>
</name>
<name>
<surname>Chervone</surname>
<given-names>Ay</given-names>
</name>
</person-group>
<article-title>On a class of pattern-recognition learning algorithms</article-title>
<source>Automat. Rem. Contr+.</source>
<year>1965</year>
<volume>25</volume>
<fpage>838</fpage>
</element-citation>
</ref>
<ref id="CR19">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cortes</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Vapnik</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>Support-vector networks</article-title>
<source>Mach. Learn.</source>
<year>1995</year>
<volume>20</volume>
<fpage>273</fpage>
<lpage>297</lpage>
</element-citation>
</ref>
<ref id="CR20">
<mixed-citation publication-type="other">Meyer, D. Support Vector Machines—the Interface to libsvm in package e1071. (2014). Available at:
<ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/web/packages/e1071/">http://cran.r-project.org/web/packages/e1071/</ext-link>
.</mixed-citation>
</ref>
<ref id="CR21">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Furey</surname>
<given-names>TS</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Support vector machine classification and validation of cancer tissue samples using microarray expression data</article-title>
<source>Bioinformatics</source>
<year>2000</year>
<volume>16</volume>
<fpage>906</fpage>
<lpage>914</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/16.10.906</pub-id>
<pub-id pub-id-type="pmid">11120680</pub-id>
</element-citation>
</ref>
<ref id="CR22">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>De Maesschalck</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Jouan-Rimbaud</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Massart</surname>
<given-names>DL</given-names>
</name>
</person-group>
<article-title>The Mahalanobis distance</article-title>
<source>Chemometr. Intell. Lab.</source>
<year>2000</year>
<volume>50</volume>
<fpage>1</fpage>
<lpage>18</lpage>
<pub-id pub-id-type="doi">10.1016/S0169-7439(99)00047-7</pub-id>
</element-citation>
</ref>
<ref id="CR23">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kapoor</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Simmonds</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Lipkin</surname>
<given-names>WI</given-names>
</name>
<name>
<surname>Zaidi</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Delwart</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Use of nucleotide composition analysis to infer hosts for three novel Picorna-like viruses</article-title>
<source>J. Virol.</source>
<year>2010</year>
<volume>84</volume>
<fpage>10322</fpage>
<lpage>10328</lpage>
<pub-id pub-id-type="doi">10.1128/JVI.00601-10</pub-id>
<pub-id pub-id-type="pmid">20668077</pub-id>
</element-citation>
</ref>
<ref id="CR24">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karlin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mrazek</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Compositional differences within and between eukaryotic genomes</article-title>
<source>P. Natl. Acad. Sci. USA</source>
<year>1997</year>
<volume>94</volume>
<fpage>10227</fpage>
<lpage>10232</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.94.19.10227</pub-id>
</element-citation>
</ref>
<ref id="CR25">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Song</surname>
<given-names>HD</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human</article-title>
<source>P. Natl. Acad. Sci. USA</source>
<year>2005</year>
<volume>102</volume>
<fpage>2430</fpage>
<lpage>2435</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0409608102</pub-id>
</element-citation>
</ref>
<ref id="CR26">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Natural mutations in the receptor binding domain of spike glycoprotein determine the reactivity of cross-neutralization between palm civet coronavirus and severe acute respiratory syndrome coronavirus</article-title>
<source>J. Virol.</source>
<year>2007</year>
<volume>81</volume>
<fpage>4694</fpage>
<lpage>4700</lpage>
<pub-id pub-id-type="doi">10.1128/JVI.02389-06</pub-id>
<pub-id pub-id-type="pmid">17314167</pub-id>
</element-citation>
</ref>
<ref id="CR27">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>van Boheemen</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genomic characterization of a newly discovered coronavirus associated with acute respiratory distress syndrome in humans</article-title>
<source>Mbio.</source>
<year>2012</year>
<volume>3</volume>
<fpage>1</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1128/mBio.00473-12</pub-id>
</element-citation>
</ref>
<ref id="CR28">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>NS</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Structure of MERS-CoV spike receptor-binding domain complexed with human receptor DPP4</article-title>
<source>Cell Res.</source>
<year>2013</year>
<volume>23</volume>
<fpage>986</fpage>
<lpage>993</lpage>
<pub-id pub-id-type="doi">10.1038/cr.2013.92</pub-id>
<pub-id pub-id-type="pmid">23835475</pub-id>
</element-citation>
</ref>
<ref id="CR29">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>WJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>SARS-associated coronavirus transmitted from human to pig</article-title>
<source>Emerg. Infect. Dis.</source>
<year>2005</year>
<volume>11</volume>
<fpage>446</fpage>
<lpage>448</lpage>
<pub-id pub-id-type="doi">10.3201/eid1103.040824</pub-id>
<pub-id pub-id-type="pmid">15757562</pub-id>
</element-citation>
</ref>
<ref id="CR30">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ge</surname>
<given-names>XY</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor</article-title>
<source>Nature</source>
<year>2013</year>
<volume>503</volume>
<fpage>535</fpage>
<pub-id pub-id-type="doi">10.1038/nature12711</pub-id>
<pub-id pub-id-type="pmid">24172901</pub-id>
</element-citation>
</ref>
<ref id="CR31">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jin</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Analysis of the genome sequence of an alpaca coronavirus</article-title>
<source>Virology</source>
<year>2007</year>
<volume>365</volume>
<fpage>198</fpage>
<lpage>203</lpage>
<pub-id pub-id-type="doi">10.1016/j.virol.2007.03.035</pub-id>
<pub-id pub-id-type="pmid">17459444</pub-id>
</element-citation>
</ref>
<ref id="CR32">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Drake</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Holland</surname>
<given-names>JJ</given-names>
</name>
</person-group>
<article-title>Mutation rates among RNA viruses</article-title>
<source>P. Natl. Acad. Sci. USA</source>
<year>1999</year>
<volume>96</volume>
<fpage>13910</fpage>
<lpage>13913</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.96.24.13910</pub-id>
</element-citation>
</ref>
<ref id="CR33">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>GW</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genomic signatures of human versus avian influenza A viruses</article-title>
<source>Emerg. Infect. Dis.</source>
<year>2006</year>
<volume>12</volume>
<fpage>1353</fpage>
<lpage>1360</lpage>
<pub-id pub-id-type="doi">10.3201/eid1209.060276</pub-id>
<pub-id pub-id-type="pmid">17073083</pub-id>
</element-citation>
</ref>
<ref id="CR34">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Manz</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Brunotte</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Reuther</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schwemmle</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Adaptive mutations in NEP compensate for defective H5N1 RNA replication in cultured human cells</article-title>
<source>Nat. Commun.</source>
<year>2012</year>
<volume>3</volume>
<fpage>802</fpage>
<pub-id pub-id-type="doi">10.1038/ncomms1804</pub-id>
<pub-id pub-id-type="pmid">22549831</pub-id>
</element-citation>
</ref>
<ref id="CR35">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Romero-Tejeda</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Capua</surname>
<given-names>I</given-names>
</name>
</person-group>
<article-title>Virus-specific factors associated with zoonotic and pandemic potential</article-title>
<source>Influenza Other Respi. Viruses</source>
<year>2013</year>
<volume>7</volume>
<fpage>4</fpage>
<lpage>14</lpage>
<pub-id pub-id-type="doi">10.1111/irv.12075</pub-id>
</element-citation>
</ref>
<ref id="CR36">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Woo</surname>
<given-names>PCY</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus</article-title>
<source>J. Virol.</source>
<year>2012</year>
<volume>86</volume>
<fpage>3995</fpage>
<lpage>4008</lpage>
<pub-id pub-id-type="doi">10.1128/JVI.06540-11</pub-id>
<pub-id pub-id-type="pmid">22278237</pub-id>
</element-citation>
</ref>
<ref id="CR37">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vijgen</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Complete genomic sequence of human coronavirus OC43: Molecular clock analysis suggests a relatively recent zoonotic coronavirus transmission event</article-title>
<source>J. Virol.</source>
<year>2005</year>
<volume>79</volume>
<fpage>1595</fpage>
<lpage>1604</lpage>
<pub-id pub-id-type="doi">10.1128/JVI.79.3.1595-1604.2005</pub-id>
<pub-id pub-id-type="pmid">15650185</pub-id>
</element-citation>
</ref>
<ref id="CR38">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Burge</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Campbell</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Karlin</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Over-representation and under-representation of short oligonucleotides in DNA-sequences</article-title>
<source>P. Natl. Acad. Sci. USA</source>
<year>1992</year>
<volume>89</volume>
<fpage>1358</fpage>
<lpage>1362</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.89.4.1358</pub-id>
</element-citation>
</ref>
<ref id="CR39">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McLachlan</surname>
<given-names>GJ</given-names>
</name>
</person-group>
<article-title>Mahalanobis distance</article-title>
<source>Resonance</source>
<year>1999</year>
<volume>4</volume>
<fpage>20</fpage>
<lpage>26</lpage>
<pub-id pub-id-type="doi">10.1007/BF02834632</pub-id>
</element-citation>
</ref>
<ref id="CR40">
<mixed-citation publication-type="other">Xue, Y. & Chen, L. P. Statistical modeling and R software, 383–384 (Tsinghua University Press, Beijing, China, 2007).</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000427 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000427 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4660426
   |texte=   Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:26607834" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021