Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000F89 ( Pmc/Corpus ); précédent : 000F889; suivant : 000F900 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations</title>
<author>
<name sortKey="Li, Weiling" sort="Li, Weiling" uniqKey="Li W" first="Weiling" last="Li">Weiling Li</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lin, Lin" sort="Lin, Lin" uniqKey="Lin L" first="Lin" last="Lin">Lin Lin</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Department of Statistics, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Malhotra, Raunaq" sort="Malhotra, Raunaq" uniqKey="Malhotra R" first="Raunaq" last="Malhotra">Raunaq Malhotra</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Yang, Lei" sort="Yang, Lei" uniqKey="Yang L" first="Lei" last="Yang">Lei Yang</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Department of Biology, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Acharya, Raj" sort="Acharya, Raj" uniqKey="Acharya R" first="Raj" last="Acharya">Raj Acharya</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Poss, Mary" sort="Poss, Mary" uniqKey="Poss M" first="Mary" last="Poss">Mary Poss</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Department of Biology, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff005">
<addr-line>Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">30921327</idno>
<idno type="pmc">6456218</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456218</idno>
<idno type="RBID">PMC:6456218</idno>
<idno type="doi">10.1371/journal.pcbi.1006564</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">000F89</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000F89</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations</title>
<author>
<name sortKey="Li, Weiling" sort="Li, Weiling" uniqKey="Li W" first="Weiling" last="Li">Weiling Li</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lin, Lin" sort="Lin, Lin" uniqKey="Lin L" first="Lin" last="Lin">Lin Lin</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Department of Statistics, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Malhotra, Raunaq" sort="Malhotra, Raunaq" uniqKey="Malhotra R" first="Raunaq" last="Malhotra">Raunaq Malhotra</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Yang, Lei" sort="Yang, Lei" uniqKey="Yang L" first="Lei" last="Yang">Lei Yang</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Department of Biology, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Acharya, Raj" sort="Acharya, Raj" uniqKey="Acharya R" first="Raj" last="Acharya">Raj Acharya</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Poss, Mary" sort="Poss, Mary" uniqKey="Poss M" first="Mary" last="Poss">Mary Poss</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Department of Biology, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff005">
<addr-line>Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS Computational Biology</title>
<idno type="ISSN">1553-734X</idno>
<idno type="eISSN">1553-7358</idno>
<imprint>
<date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Human Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retroviruses. Indeed viral transcripts, proteins, and antibody against HERV-K are detected in cancers, auto-immune, and neurodegenerative diseases. However, attempts to link a polymorphic HERV-K with any disease have been frustrated in part because population prevalence of HERV-K provirus at each polymorphic site is lacking and it is challenging to identify closely related elements such as HERV-K from short read sequence data. We present an integrated and computationally robust approach that uses whole genome short read data to determine the occupation status at all sites reported to contain a HERV-K provirus. Our method estimates the proportion of fixed length genomic sequence (
<italic>k-mers</italic>
) from whole genome sequence data matching a reference set of
<italic>k-mers</italic>
unique to each HERV-K locus and applies mixture model-based clustering of these values to account for low depth sequence data. Our analysis of 1000 Genomes Project Data (KGP) reveals numerous differences among the five KGP super-populations in the prevalence of individual and co-occurring HERV-K proviruses; we provide a visualization tool to easily depict the proportion of the KGP populations with any combination of polymorphic HERV-K provirus. Further, because HERV-K is insertionally polymorphic, the genome burden of known polymorphic HERV-K is variable in humans; this burden is lowest in East Asian (EAS) individuals. Our study identifies population-specific sequence variation for HERV-K proviruses at several loci. We expect these resources will advance research on HERV-K contributions to human diseases.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Hayward, A" uniqKey="Hayward A">A Hayward</name>
</author>
<author>
<name sortKey="Grabherr, M" uniqKey="Grabherr M">M Grabherr</name>
</author>
<author>
<name sortKey="Jern, P" uniqKey="Jern P">P Jern</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Feschotte, C" uniqKey="Feschotte C">C Feschotte</name>
</author>
<author>
<name sortKey="Gilbert, C" uniqKey="Gilbert C">C Gilbert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stoye, Jp" uniqKey="Stoye J">JP Stoye</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gifford, R" uniqKey="Gifford R">R Gifford</name>
</author>
<author>
<name sortKey="Tristem, M" uniqKey="Tristem M">M Tristem</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weiss, Ra" uniqKey="Weiss R">RA Weiss</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jern, P" uniqKey="Jern P">P Jern</name>
</author>
<author>
<name sortKey="Coffin, Jm" uniqKey="Coffin J">JM Coffin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lower, R" uniqKey="Lower R">R Löwer</name>
</author>
<author>
<name sortKey="Lower, J" uniqKey="Lower J">J Löwer</name>
</author>
<author>
<name sortKey="Kurth, R" uniqKey="Kurth R">R Kurth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bannert, N" uniqKey="Bannert N">N Bannert</name>
</author>
<author>
<name sortKey="Kurth, R" uniqKey="Kurth R">R Kurth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moyes, D" uniqKey="Moyes D">D Moyes</name>
</author>
<author>
<name sortKey="Griffiths, Dj" uniqKey="Griffiths D">DJ Griffiths</name>
</author>
<author>
<name sortKey="Venables, Pj" uniqKey="Venables P">PJ Venables</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Subramanian, Rp" uniqKey="Subramanian R">RP Subramanian</name>
</author>
<author>
<name sortKey="Wildschutte, Jh" uniqKey="Wildschutte J">JH Wildschutte</name>
</author>
<author>
<name sortKey="Russo, C" uniqKey="Russo C">C Russo</name>
</author>
<author>
<name sortKey="Coffin, Jm" uniqKey="Coffin J">JM Coffin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wildschutte, Jh" uniqKey="Wildschutte J">JH Wildschutte</name>
</author>
<author>
<name sortKey="Williams, Zh" uniqKey="Williams Z">ZH Williams</name>
</author>
<author>
<name sortKey="Montesion, M" uniqKey="Montesion M">M Montesion</name>
</author>
<author>
<name sortKey="Subramanian, Rp" uniqKey="Subramanian R">RP Subramanian</name>
</author>
<author>
<name sortKey="Kidd, Jm" uniqKey="Kidd J">JM Kidd</name>
</author>
<author>
<name sortKey="Coffin, Jm" uniqKey="Coffin J">JM Coffin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kurth, R" uniqKey="Kurth R">R Kurth</name>
</author>
<author>
<name sortKey="Bannert, N" uniqKey="Bannert N">N Bannert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Treangen, Tj" uniqKey="Treangen T">TJ Treangen</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Belshaw, R" uniqKey="Belshaw R">R Belshaw</name>
</author>
<author>
<name sortKey="Watson, J" uniqKey="Watson J">J Watson</name>
</author>
<author>
<name sortKey="Katzourakis, A" uniqKey="Katzourakis A">A Katzourakis</name>
</author>
<author>
<name sortKey="Howe, A" uniqKey="Howe A">A Howe</name>
</author>
<author>
<name sortKey="Woolven Allen, J" uniqKey="Woolven Allen J">J Woolven-Allen</name>
</author>
<author>
<name sortKey="Burt, A" uniqKey="Burt A">A Burt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Medstrand, P" uniqKey="Medstrand P">P Medstrand</name>
</author>
<author>
<name sortKey="Mager, Dl" uniqKey="Mager D">DL Mager</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hughes, Jf" uniqKey="Hughes J">JF Hughes</name>
</author>
<author>
<name sortKey="Coffin, Jm" uniqKey="Coffin J">JM Coffin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Belshaw, R" uniqKey="Belshaw R">R Belshaw</name>
</author>
<author>
<name sortKey="Dawson, Ala" uniqKey="Dawson A">ALA Dawson</name>
</author>
<author>
<name sortKey="Woolven Allen, J" uniqKey="Woolven Allen J">J Woolven-Allen</name>
</author>
<author>
<name sortKey="Redding, J" uniqKey="Redding J">J Redding</name>
</author>
<author>
<name sortKey="Burt, A" uniqKey="Burt A">A Burt</name>
</author>
<author>
<name sortKey="Tristem, M" uniqKey="Tristem M">M Tristem</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marchi, E" uniqKey="Marchi E">E Marchi</name>
</author>
<author>
<name sortKey="Kanapin, A" uniqKey="Kanapin A">A Kanapin</name>
</author>
<author>
<name sortKey="Magiorkinis, G" uniqKey="Magiorkinis G">G Magiorkinis</name>
</author>
<author>
<name sortKey="Belshaw, R" uniqKey="Belshaw R">R Belshaw</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shin, W" uniqKey="Shin W">W Shin</name>
</author>
<author>
<name sortKey="Lee, J" uniqKey="Lee J">J Lee</name>
</author>
<author>
<name sortKey="Son, S Y" uniqKey="Son S">S-Y Son</name>
</author>
<author>
<name sortKey="Ahn, K" uniqKey="Ahn K">K Ahn</name>
</author>
<author>
<name sortKey="Kim H, S" uniqKey="Kim H ">-S Kim H</name>
</author>
<author>
<name sortKey="Han, K" uniqKey="Han K">K Han</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Groger, V" uniqKey="Groger V">V Gröger</name>
</author>
<author>
<name sortKey="Cynis, H" uniqKey="Cynis H">H Cynis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Young, Gr" uniqKey="Young G">GR Young</name>
</author>
<author>
<name sortKey="Stoye, Jp" uniqKey="Stoye J">JP Stoye</name>
</author>
<author>
<name sortKey="Kassiotis, G" uniqKey="Kassiotis G">G Kassiotis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ryan, Fp" uniqKey="Ryan F">FP Ryan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Volkman, He" uniqKey="Volkman H">HE Volkman</name>
</author>
<author>
<name sortKey="Stetson, Db" uniqKey="Stetson D">DB Stetson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Magiorkinis, G" uniqKey="Magiorkinis G">G Magiorkinis</name>
</author>
<author>
<name sortKey="Belshaw, R" uniqKey="Belshaw R">R Belshaw</name>
</author>
<author>
<name sortKey="Katzourakis, A" uniqKey="Katzourakis A">A Katzourakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lower, R" uniqKey="Lower R">R Löwer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hohn, O" uniqKey="Hohn O">O Hohn</name>
</author>
<author>
<name sortKey="Hanke, K" uniqKey="Hanke K">K Hanke</name>
</author>
<author>
<name sortKey="Bannert, N" uniqKey="Bannert N">N Bannert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hughes, Jf" uniqKey="Hughes J">JF Hughes</name>
</author>
<author>
<name sortKey="Coffin, Jm" uniqKey="Coffin J">JM Coffin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hughes, Jf" uniqKey="Hughes J">JF Hughes</name>
</author>
<author>
<name sortKey="Coffin, Jm" uniqKey="Coffin J">JM Coffin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Romanish, Mt" uniqKey="Romanish M">MT Romanish</name>
</author>
<author>
<name sortKey="Cohen, Cj" uniqKey="Cohen C">CJ Cohen</name>
</author>
<author>
<name sortKey="Mager, Dl" uniqKey="Mager D">DL Mager</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kamp, C" uniqKey="Kamp C">C Kamp</name>
</author>
<author>
<name sortKey="Hirschmann, P" uniqKey="Hirschmann P">P Hirschmann</name>
</author>
<author>
<name sortKey="Voss, H" uniqKey="Voss H">H Voss</name>
</author>
<author>
<name sortKey="Huellen, K" uniqKey="Huellen K">K Huellen</name>
</author>
<author>
<name sortKey="Vogt, Ph" uniqKey="Vogt P">PH Vogt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kidd, Jm" uniqKey="Kidd J">JM Kidd</name>
</author>
<author>
<name sortKey="Graves, T" uniqKey="Graves T">T Graves</name>
</author>
<author>
<name sortKey="Newman, Tl" uniqKey="Newman T">TL Newman</name>
</author>
<author>
<name sortKey="Fulton, R" uniqKey="Fulton R">R Fulton</name>
</author>
<author>
<name sortKey="Hayden, Hs" uniqKey="Hayden H">HS Hayden</name>
</author>
<author>
<name sortKey="Malig, M" uniqKey="Malig M">M Malig</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cohen, Cj" uniqKey="Cohen C">CJ Cohen</name>
</author>
<author>
<name sortKey="Lock, Wm" uniqKey="Lock W">WM Lock</name>
</author>
<author>
<name sortKey="Mager, Dl" uniqKey="Mager D">DL Mager</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simmons, W" uniqKey="Simmons W">W Simmons</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wildschutte, Jh" uniqKey="Wildschutte J">JH Wildschutte</name>
</author>
<author>
<name sortKey="Ram, D" uniqKey="Ram D">D Ram</name>
</author>
<author>
<name sortKey="Subramanian, R" uniqKey="Subramanian R">R Subramanian</name>
</author>
<author>
<name sortKey="Stevens, Vl" uniqKey="Stevens V">VL Stevens</name>
</author>
<author>
<name sortKey="Coffin, Jm" uniqKey="Coffin J">JM Coffin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kassiotis, G" uniqKey="Kassiotis G">G Kassiotis</name>
</author>
<author>
<name sortKey="Stoye, Jp" uniqKey="Stoye J">JP Stoye</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johanning, Gl" uniqKey="Johanning G">GL Johanning</name>
</author>
<author>
<name sortKey="Malouf, Gg" uniqKey="Malouf G">GG Malouf</name>
</author>
<author>
<name sortKey="Zheng, X" uniqKey="Zheng X">X Zheng</name>
</author>
<author>
<name sortKey="Esteva, Fj" uniqKey="Esteva F">FJ Esteva</name>
</author>
<author>
<name sortKey="Weinstein, Jn" uniqKey="Weinstein J">JN Weinstein</name>
</author>
<author>
<name sortKey="Wang Johanning, F" uniqKey="Wang Johanning F">F Wang-Johanning</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bhardwaj, N" uniqKey="Bhardwaj N">N Bhardwaj</name>
</author>
<author>
<name sortKey="Coffin, Jm" uniqKey="Coffin J">JM Coffin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hanke, K" uniqKey="Hanke K">K Hanke</name>
</author>
<author>
<name sortKey="Hohn, O" uniqKey="Hohn O">O Hohn</name>
</author>
<author>
<name sortKey="Bannert, N" uniqKey="Bannert N">N Bannert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Trela, M" uniqKey="Trela M">M Trela</name>
</author>
<author>
<name sortKey="Nelson, Pn" uniqKey="Nelson P">PN Nelson</name>
</author>
<author>
<name sortKey="Rylance, Pb" uniqKey="Rylance P">PB Rylance</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Antony, Jm" uniqKey="Antony J">JM Antony</name>
</author>
<author>
<name sortKey="Deslauriers, Am" uniqKey="Deslauriers A">AM Deslauriers</name>
</author>
<author>
<name sortKey="Bhat, Rk" uniqKey="Bhat R">RK Bhat</name>
</author>
<author>
<name sortKey="Ellestad, Kk" uniqKey="Ellestad K">KK Ellestad</name>
</author>
<author>
<name sortKey="Power, C" uniqKey="Power C">C Power</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tugnet, N" uniqKey="Tugnet N">N Tugnet</name>
</author>
<author>
<name sortKey="Rylance, P" uniqKey="Rylance P">P Rylance</name>
</author>
<author>
<name sortKey="Roden, D" uniqKey="Roden D">D Roden</name>
</author>
<author>
<name sortKey="Trela, M" uniqKey="Trela M">M Trela</name>
</author>
<author>
<name sortKey="Nelson, P" uniqKey="Nelson P">P Nelson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Lee, M H" uniqKey="Lee M">M-H Lee</name>
</author>
<author>
<name sortKey="Henderson, L" uniqKey="Henderson L">L Henderson</name>
</author>
<author>
<name sortKey="Tyagi, R" uniqKey="Tyagi R">R Tyagi</name>
</author>
<author>
<name sortKey="Bachani, M" uniqKey="Bachani M">M Bachani</name>
</author>
<author>
<name sortKey="Steiner, J" uniqKey="Steiner J">J Steiner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Douville, Rn" uniqKey="Douville R">RN Douville</name>
</author>
<author>
<name sortKey="Nath, A" uniqKey="Nath A">A Nath</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Trombetta, B" uniqKey="Trombetta B">B Trombetta</name>
</author>
<author>
<name sortKey="Fantini, G" uniqKey="Fantini G">G Fantini</name>
</author>
<author>
<name sortKey="D Tanasio, E" uniqKey="D Tanasio E">E D’Atanasio</name>
</author>
<author>
<name sortKey="Sellitto, D" uniqKey="Sellitto D">D Sellitto</name>
</author>
<author>
<name sortKey="Cruciani, F" uniqKey="Cruciani F">F Cruciani</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nex, Ba" uniqKey="Nex B">BA Nexø</name>
</author>
<author>
<name sortKey="Villesen, P" uniqKey="Villesen P">P Villesen</name>
</author>
<author>
<name sortKey="Nissen, Kk" uniqKey="Nissen K">KK Nissen</name>
</author>
<author>
<name sortKey="Lindegaard, Hm" uniqKey="Lindegaard H">HM Lindegaard</name>
</author>
<author>
<name sortKey="Rossing, P" uniqKey="Rossing P">P Rossing</name>
</author>
<author>
<name sortKey="Petersen, T" uniqKey="Petersen T">T Petersen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bhardwaj, N" uniqKey="Bhardwaj N">N Bhardwaj</name>
</author>
<author>
<name sortKey="Montesion, M" uniqKey="Montesion M">M Montesion</name>
</author>
<author>
<name sortKey="Roy, F" uniqKey="Roy F">F Roy</name>
</author>
<author>
<name sortKey="Coffin, Jm" uniqKey="Coffin J">JM Coffin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fukunaga, K" uniqKey="Fukunaga K">K Fukunaga</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ciuffi, A" uniqKey="Ciuffi A">A Ciuffi</name>
</author>
<author>
<name sortKey="Ronen, K" uniqKey="Ronen K">K Ronen</name>
</author>
<author>
<name sortKey="Brady, T" uniqKey="Brady T">T Brady</name>
</author>
<author>
<name sortKey="Malani, N" uniqKey="Malani N">N Malani</name>
</author>
<author>
<name sortKey="Wang, G" uniqKey="Wang G">G Wang</name>
</author>
<author>
<name sortKey="Berry, Cc" uniqKey="Berry C">CC Berry</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Witherspoon, Dj" uniqKey="Witherspoon D">DJ Witherspoon</name>
</author>
<author>
<name sortKey="Xing, J" uniqKey="Xing J">J Xing</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Watkins, Ws" uniqKey="Watkins W">WS Watkins</name>
</author>
<author>
<name sortKey="Batzer, Ma" uniqKey="Batzer M">MA Batzer</name>
</author>
<author>
<name sortKey="Jorde, Lb" uniqKey="Jorde L">LB Jorde</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sudmant, Ph" uniqKey="Sudmant P">PH Sudmant</name>
</author>
<author>
<name sortKey="Rausch, T" uniqKey="Rausch T">T Rausch</name>
</author>
<author>
<name sortKey="Gardner, Ej" uniqKey="Gardner E">EJ Gardner</name>
</author>
<author>
<name sortKey="Handsaker, Re" uniqKey="Handsaker R">RE Handsaker</name>
</author>
<author>
<name sortKey="Abyzov, A" uniqKey="Abyzov A">A Abyzov</name>
</author>
<author>
<name sortKey="Huddleston, J" uniqKey="Huddleston J">J Huddleston</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lin, L" uniqKey="Lin L">L Lin</name>
</author>
<author>
<name sortKey="Chan, C" uniqKey="Chan C">C Chan</name>
</author>
<author>
<name sortKey="West, M" uniqKey="West M">M West</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Escobar, Md" uniqKey="Escobar M">MD Escobar</name>
</author>
<author>
<name sortKey="West, M" uniqKey="West M">M West</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ishwaran, H" uniqKey="Ishwaran H">H Ishwaran</name>
</author>
<author>
<name sortKey="James, Lf" uniqKey="James L">LF James</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, L" uniqKey="Huang L">L Huang</name>
</author>
<author>
<name sortKey="Chen, H" uniqKey="Chen H">H Chen</name>
</author>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author>
<name sortKey="Chen, G" uniqKey="Chen G">G Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benjamini, Y" uniqKey="Benjamini Y">Y Benjamini</name>
</author>
<author>
<name sortKey="Hochberg, Y" uniqKey="Hochberg Y">Y Hochberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bostock, M" uniqKey="Bostock M">M Bostock</name>
</author>
<author>
<name sortKey="Ogievetsky, V" uniqKey="Ogievetsky V">V Ogievetsky</name>
</author>
<author>
<name sortKey="Heer, J" uniqKey="Heer J">J Heer</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS Comput Biol</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS Comput. Biol</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">ploscomp</journal-id>
<journal-title-group>
<journal-title>PLoS Computational Biology</journal-title>
</journal-title-group>
<issn pub-type="ppub">1553-734X</issn>
<issn pub-type="epub">1553-7358</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">30921327</article-id>
<article-id pub-id-type="pmc">6456218</article-id>
<article-id pub-id-type="doi">10.1371/journal.pcbi.1006564</article-id>
<article-id pub-id-type="publisher-id">PCOMPBIOL-D-18-01737</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Computational Biology</subject>
<subj-group>
<subject>Genome Analysis</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Genetics</subject>
<subj-group>
<subject>Genomics</subject>
<subj-group>
<subject>Genome Analysis</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Research and Analysis Methods</subject>
<subj-group>
<subject>Mathematical and Statistical Techniques</subject>
<subj-group>
<subject>Statistical Methods</subject>
<subj-group>
<subject>Linear Discriminant Analysis</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Physical Sciences</subject>
<subj-group>
<subject>Mathematics</subject>
<subj-group>
<subject>Statistics</subject>
<subj-group>
<subject>Statistical Methods</subject>
<subj-group>
<subject>Linear Discriminant Analysis</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Research and Analysis Methods</subject>
<subj-group>
<subject>Database and Informatics Methods</subject>
<subj-group>
<subject>Biological Databases</subject>
<subj-group>
<subject>Sequence Databases</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Research and Analysis Methods</subject>
<subj-group>
<subject>Database and Informatics Methods</subject>
<subj-group>
<subject>Bioinformatics</subject>
<subj-group>
<subject>Sequence Analysis</subject>
<subj-group>
<subject>Sequence Databases</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Computer and Information Sciences</subject>
<subj-group>
<subject>Information Technology</subject>
<subj-group>
<subject>Data Mining</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Genetics</subject>
<subj-group>
<subject>Genomics</subject>
<subj-group>
<subject>Human Genomics</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Genetics</subject>
<subj-group>
<subject>Genomics</subject>
<subj-group>
<subject>Animal Genomics</subject>
<subj-group>
<subject>Mammalian Genomics</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Research and Analysis Methods</subject>
<subj-group>
<subject>Database and Informatics Methods</subject>
<subj-group>
<subject>Biological Databases</subject>
<subj-group>
<subject>Genomic Databases</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Computational Biology</subject>
<subj-group>
<subject>Genome Analysis</subject>
<subj-group>
<subject>Genomic Databases</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Genetics</subject>
<subj-group>
<subject>Genomics</subject>
<subj-group>
<subject>Genome Analysis</subject>
<subj-group>
<subject>Genomic Databases</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Computer and Information Sciences</subject>
<subj-group>
<subject>Data Visualization</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations</article-title>
<alt-title alt-title-type="running-head">Genomic distribution of polymorphic HERV-K</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Weiling</given-names>
</name>
<role content-type="http://credit.casrai.org/">Conceptualization</role>
<role content-type="http://credit.casrai.org/">Formal analysis</role>
<role content-type="http://credit.casrai.org/">Investigation</role>
<role content-type="http://credit.casrai.org/">Methodology</role>
<role content-type="http://credit.casrai.org/">Software</role>
<role content-type="http://credit.casrai.org/">Validation</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0002-7464-1172</contrib-id>
<name>
<surname>Lin</surname>
<given-names>Lin</given-names>
</name>
<role content-type="http://credit.casrai.org/">Formal analysis</role>
<role content-type="http://credit.casrai.org/">Methodology</role>
<role content-type="http://credit.casrai.org/">Supervision</role>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0002-7253-850X</contrib-id>
<name>
<surname>Malhotra</surname>
<given-names>Raunaq</given-names>
</name>
<role content-type="http://credit.casrai.org/">Conceptualization</role>
<role content-type="http://credit.casrai.org/">Formal analysis</role>
<role content-type="http://credit.casrai.org/">Investigation</role>
<role content-type="http://credit.casrai.org/">Methodology</role>
<role content-type="http://credit.casrai.org/">Validation</role>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="currentaff001">
<sup>¤a</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yang</surname>
<given-names>Lei</given-names>
</name>
<role content-type="http://credit.casrai.org/">Formal analysis</role>
<role content-type="http://credit.casrai.org/">Investigation</role>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Acharya</surname>
<given-names>Raj</given-names>
</name>
<role content-type="http://credit.casrai.org/">Funding acquisition</role>
<role content-type="http://credit.casrai.org/">Resources</role>
<role content-type="http://credit.casrai.org/">Supervision</role>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0003-4147-2410</contrib-id>
<name>
<surname>Poss</surname>
<given-names>Mary</given-names>
</name>
<role content-type="http://credit.casrai.org/">Conceptualization</role>
<role content-type="http://credit.casrai.org/">Formal analysis</role>
<role content-type="http://credit.casrai.org/">Funding acquisition</role>
<role content-type="http://credit.casrai.org/">Investigation</role>
<role content-type="http://credit.casrai.org/">Project administration</role>
<role content-type="http://credit.casrai.org/">Resources</role>
<role content-type="http://credit.casrai.org/">Supervision</role>
<role content-type="http://credit.casrai.org/">Validation</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<role content-type="http://credit.casrai.org/">Writing – review & editing</role>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="aff005">
<sup>5</sup>
</xref>
<xref ref-type="author-notes" rid="currentaff002">
<sup>¤b</sup>
</xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
</contrib-group>
<aff id="aff001">
<label>1</label>
<addr-line>The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</aff>
<aff id="aff002">
<label>2</label>
<addr-line>Department of Statistics, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</aff>
<aff id="aff003">
<label>3</label>
<addr-line>Department of Biology, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</aff>
<aff id="aff004">
<label>4</label>
<addr-line>School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, United States of America</addr-line>
</aff>
<aff id="aff005">
<label>5</label>
<addr-line>Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, PA, United States of America</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Wilke</surname>
<given-names>Claus O.</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>University of Texas at Austin, UNITED STATES</addr-line>
</aff>
<author-notes>
<fn fn-type="COI-statement" id="coi001">
<p>The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="current-aff" id="currentaff001">
<label>¤a</label>
<p>Current address: GNS Healthcare, Cambridge, MA, United States of America</p>
</fn>
<fn fn-type="current-aff" id="currentaff002">
<label>¤b</label>
<p>Current address: Division of Hematology and Oncology, University of Virginia School of Medicine, Charlottesville, VA, United States of America</p>
</fn>
<corresp id="cor001">* E-mail:
<email>maryposs@gmail.com</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>28</day>
<month>3</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<month>3</month>
<year>2019</year>
</pub-date>
<volume>15</volume>
<issue>3</issue>
<elocation-id>e1006564</elocation-id>
<history>
<date date-type="received">
<day>10</day>
<month>10</month>
<year>2018</year>
</date>
<date date-type="accepted">
<day>5</day>
<month>3</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>© 2019 Li et al</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Li et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="pcbi.1006564.pdf"></self-uri>
<abstract>
<p>Human Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retroviruses. Indeed viral transcripts, proteins, and antibody against HERV-K are detected in cancers, auto-immune, and neurodegenerative diseases. However, attempts to link a polymorphic HERV-K with any disease have been frustrated in part because population prevalence of HERV-K provirus at each polymorphic site is lacking and it is challenging to identify closely related elements such as HERV-K from short read sequence data. We present an integrated and computationally robust approach that uses whole genome short read data to determine the occupation status at all sites reported to contain a HERV-K provirus. Our method estimates the proportion of fixed length genomic sequence (
<italic>k-mers</italic>
) from whole genome sequence data matching a reference set of
<italic>k-mers</italic>
unique to each HERV-K locus and applies mixture model-based clustering of these values to account for low depth sequence data. Our analysis of 1000 Genomes Project Data (KGP) reveals numerous differences among the five KGP super-populations in the prevalence of individual and co-occurring HERV-K proviruses; we provide a visualization tool to easily depict the proportion of the KGP populations with any combination of polymorphic HERV-K provirus. Further, because HERV-K is insertionally polymorphic, the genome burden of known polymorphic HERV-K is variable in humans; this burden is lowest in East Asian (EAS) individuals. Our study identifies population-specific sequence variation for HERV-K proviruses at several loci. We expect these resources will advance research on HERV-K contributions to human diseases.</p>
</abstract>
<abstract abstract-type="summary">
<title>Author summary</title>
<p>Human Endogenous Retrovirus type K (HERV-K) is the youngest of retrovirus families in the human genome and is the only group of endogenous retroviruses that has polymorphic members; a locus containing a HERV-K can be occupied in one individual but empty in others. HERV-Ks could contribute to disease risk or pathogenesis but linking one of the known polymorphic HERV-K to a specific disease has been difficult. We develop an easy to use method that reveals the considerable variation existing among global populations in the prevalence of individual and co-occurring polymorphic HERV-K, and in the number of HERV-K that any individual has in their genome. Our study provides a reference of diversity for the currently known polymorphic HERV-K in global populations and tools needed to determine the profile of all known polymorphic HERV-K in the genome of any patient population.</p>
</abstract>
<funding-group>
<award-group id="award001">
<funding-source>
<institution>National Science Foundation (US)</institution>
</funding-source>
<award-id>1724008</award-id>
<principal-award-recipient>
<name>
<surname>Acharya</surname>
<given-names>Raj</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award002">
<funding-source>
<institution>National Science Foundation</institution>
</funding-source>
<award-id>1720635</award-id>
<principal-award-recipient>
<name>
<surname>Acharya</surname>
<given-names>Raj</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award003">
<funding-source>
<institution>National Cancer Institute (US)</institution>
</funding-source>
<award-id>RO1CA170334</award-id>
</award-group>
<funding-statement>This research was supported in part by the National Science Foundation award numbers 1724008 and 1720635 to RA. WL and LY were funded in part by by the National Cancer Institute of the National Institutes of Health under Award Number 7RO1CA170334 (MP subaward PI). WL was a recipient of the Louis S. and Sara S. Michael Endowed Graduate Fellowship in Engineering and the Fred A. and Susan Breidenbach Graduate Fellowship in Engineering. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts>
<fig-count count="5"></fig-count>
<table-count count="1"></table-count>
<page-count count="21"></page-count>
</counts>
<custom-meta-group>
<custom-meta>
<meta-name>PLOS Publication Stage</meta-name>
<meta-value>vor-update-to-uncorrected-proof</meta-value>
</custom-meta>
<custom-meta>
<meta-name>Publication Update</meta-name>
<meta-value>2019-04-09</meta-value>
</custom-meta>
<custom-meta id="data-availability">
<meta-name>Data Availability</meta-name>
<meta-value>All relevant data are within the manuscript and its Supporting Information files. The code is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/lwl1112/polymorphicHERV">https://github.com/lwl1112/polymorphicHERV</ext-link>
.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes>
<title>Data Availability</title>
<p>All relevant data are within the manuscript and its Supporting Information files. The code is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/lwl1112/polymorphicHERV">https://github.com/lwl1112/polymorphicHERV</ext-link>
.</p>
</notes>
</front>
<body>
<sec sec-type="intro" id="sec001">
<title>Introduction</title>
<p>Endogenous retroviruses (ERVs) are derived from infectious retroviruses that integrated into a host germ cell at some time in the evolutionary history of a species [
<xref rid="pcbi.1006564.ref001" ref-type="bibr">1</xref>
<xref rid="pcbi.1006564.ref005" ref-type="bibr">5</xref>
]. ERVs in humans (HERVs) comprise up to 8% of the genome and have contributed important functions to their host [
<xref rid="pcbi.1006564.ref006" ref-type="bibr">6</xref>
<xref rid="pcbi.1006564.ref008" ref-type="bibr">8</xref>
]. The infection events that resulted in the contemporary profile of HERVs occurred prior to emergence of modern humans so most HERVs are fixed in human populations and those of closely related primates. However some HERVs are still transcriptionally active and capable of causing new germline insertions so that individuals differ in the number and genomic location occupied by an ERV, a situation termed insertional polymorphism [
<xref rid="pcbi.1006564.ref009" ref-type="bibr">9</xref>
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
]. Among all families of HERVs, HERV-K is the only one known to be insertionally polymorphic in humans. However, HERV-K genomes are closely related and as with many repetitive elements, they are difficult to accurately assign to a genomic location using standard mapping approaches [
<xref rid="pcbi.1006564.ref012" ref-type="bibr">12</xref>
,
<xref rid="pcbi.1006564.ref013" ref-type="bibr">13</xref>
].</p>
<p>The DNA form of a retrovirus is called a provirus and minimally encodes the structural
<italic>gag</italic>
and
<italic>env</italic>
gene, and genes for a protease and polymerase, termed
<italic>pol</italic>
. Viral genes are flanked by long terminal repeats (5’ or 3’ LTR). While there are several HERV-K that are full length, none are infectious and most contain mutations or deletions that affect the open reading frames or truncate the virus. Further, the LTRs are substrates for homologous recombination, which deletes virus genes while retaining a single, or solo, LTR at the integration site [
<xref rid="pcbi.1006564.ref014" ref-type="bibr">14</xref>
<xref rid="pcbi.1006564.ref016" ref-type="bibr">16</xref>
]. Insertional polymorphism typically refers to the presence or absence of a retrovirus at a specific locus [
<xref rid="pcbi.1006564.ref017" ref-type="bibr">17</xref>
,
<xref rid="pcbi.1006564.ref018" ref-type="bibr">18</xref>
]. However an occupied site can contain a provirus in some individuals and a solo LTR in others and hence still display polymorphism. Thus HERV-K and other HERVs have contributed to genomic diversity in the global human population in several ways [
<xref rid="pcbi.1006564.ref019" ref-type="bibr">19</xref>
].</p>
<p>The presence of antibodies to HERV proteins or HERV transcripts has spurred a quest to determine if HERVs from multiple families have a role in either proliferative or degenerative diseases in humans [
<xref rid="pcbi.1006564.ref020" ref-type="bibr">20</xref>
<xref rid="pcbi.1006564.ref026" ref-type="bibr">26</xref>
]. Although there are known mechanisms by which a HERV can cause disease; for example, by inducing genome structural variation through recombination [
<xref rid="pcbi.1006564.ref027" ref-type="bibr">27</xref>
<xref rid="pcbi.1006564.ref031" ref-type="bibr">31</xref>
], affecting host gene expression [
<xref rid="pcbi.1006564.ref032" ref-type="bibr">32</xref>
], and inappropriate activation of an immune response by viral RNA or proteins [
<xref rid="pcbi.1006564.ref023" ref-type="bibr">23</xref>
], it has been difficult to establish an etiological role of a HERV in any disease. HERV-K specifically has been associated with breast and other cancers [
<xref rid="pcbi.1006564.ref003" ref-type="bibr">3</xref>
,
<xref rid="pcbi.1006564.ref033" ref-type="bibr">33</xref>
<xref rid="pcbi.1006564.ref037" ref-type="bibr">37</xref>
], and autoimmune diseases, such as rheumatoid arthritis [
<xref rid="pcbi.1006564.ref038" ref-type="bibr">38</xref>
,
<xref rid="pcbi.1006564.ref039" ref-type="bibr">39</xref>
], multiple sclerosis [
<xref rid="pcbi.1006564.ref022" ref-type="bibr">22</xref>
,
<xref rid="pcbi.1006564.ref040" ref-type="bibr">40</xref>
] and systemic lupus erythematosus [
<xref rid="pcbi.1006564.ref008" ref-type="bibr">8</xref>
,
<xref rid="pcbi.1006564.ref022" ref-type="bibr">22</xref>
,
<xref rid="pcbi.1006564.ref041" ref-type="bibr">41</xref>
] without definitive evidence of causality or of specific loci involved. Recently, a HERV-K envelope protein was shown to recapitulate the clinical and histological lesions characterizing Amyotropic Lateral Sclerosis [
<xref rid="pcbi.1006564.ref042" ref-type="bibr">42</xref>
,
<xref rid="pcbi.1006564.ref043" ref-type="bibr">43</xref>
], providing an important mechanistic advance of a role for a HERV-K protein in a disease. Despite growing evidence for a contribution of HERV-K transcripts or proteins to the pathogenesis of human disease, it is difficult to distinguish among HERV-K loci to investigate potential roles and, in particular, to determine if a loci that is polymorphic for presence or absence of a provirus could be involved.</p>
<p>In this paper, we focus on characterizing the genomic distribution of known insertionally polymorphic HERV-K proviruses in the 1000 Genomes Project (KGP) data. We present a data-mining tool and a statistical framework that accommodates low depth whole genome sequence data characteristic of the KGP—and often patient—data to estimate the presence or absence of a provirus at all loci currently known to contain a HERV-K provirus. Using these data, we determine the number of known polymorphic HERV-K proviruses per genome because HERV-Ks can affect genomic stability [
<xref rid="pcbi.1006564.ref044" ref-type="bibr">44</xref>
] contributing to the pathogenesis of a disease. We also provide a tool to visualize HERV-K co-occurrence in global populations to facilitate exploration of synergy that might exist among specific polymorphic HERV-K in disease [
<xref rid="pcbi.1006564.ref045" ref-type="bibr">45</xref>
]. Our results provide a reference of global population diversity in HERV-K proviruses at all currently known polymorphic loci in the human genome and demonstrate that there are notable differences in the prevalence of HERV-Ks in different global populations and in the total number of HERV-Ks currently known to be polymorphic within a person’s genome.</p>
</sec>
<sec sec-type="results" id="sec002">
<title>Results</title>
<sec id="sec003">
<title>A model to estimate polymorphic HERV-K from whole genome sequence data</title>
<p>The goal of this research was to develop a computationally efficient and easy to use tool that could accurately report the status of all reported insertionally polymorphic HERV-Ks with coding potential (provirus) from whole genome sequence (WGS) data. We use the KGP database, which represents individuals in five super-populations and 26 populations, to establish the diversity in global populations at each known polymorphic HERV-K proviral locus and the total number of these polymorphic HERV-K in individual genomes to provide a foundation to study the role of HERV-K in human disease. Our reference set consists of all HERV-K sequences that are available in public databases and that can be unambiguously assigned a location in hg19. Sequences of HERV-K that are not present in hg19 but that were generated by PCR primers to the host flanking regions are included in the reference HERV-K set. From these HERV-K reference sequences, we generate a set of
<italic>k-mers</italic>
(see
<xref ref-type="supplementary-material" rid="pcbi.1006564.s003">S2 Fig</xref>
for optimizing k) that are unique to all HERV-Ks at each locus. The analysis of subject data starts with a data mining step that recovers all whole genome sequence reads that map to identified HERV-K elements in hg19. The rationale here is that polymorphic HERV-K that are not present in hg19 are greater than 80% homologous to those in the human reference genome and will map on existing elements. The recovered reads from a query WGS data set are then reduced to
<italic>k-mers</italic>
and mapped, requiring 100% match, to the reference set of
<italic>k-mers</italic>
(T), which represents all unique sites for HERV-K at each locus. The output is a ratio (n/T) of subject
<italic>k-mers</italic>
(n) that are 100% match to the reference
<italic>k-mers</italic>
(T) (see
<xref ref-type="sec" rid="sec009">Methods</xref>
for full details; the value of T for each HERV-K is in
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:virus).</p>
<p>Our preliminary analysis of the KGP data demonstrated that our
<italic>k-mer</italic>
-based approach is sensitive to sequence depth; some HERV-K loci are represented by an almost continuous range of n/T values from 0–1 (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s002">S1 Fig</xref>
), making presence/absence classification difficult. However, the majority of the KGP data is approximately 6x depth and thus to make use of this important resource, we developed a mixture model to statistically assign the n/T values from genomes to a cluster considering the sequence depth. K was optimized to 50 because this value improved our model computational efficiency and output (
<xref ref-type="fig" rid="pcbi.1006564.g001">Fig 1B</xref>
,
<xref ref-type="supplementary-material" rid="pcbi.1006564.s001">S1 Text</xref>
,
<xref ref-type="supplementary-material" rid="pcbi.1006564.s003">S2 Fig</xref>
). The affect of sequence depth on n/T can be seen by comparing the sequence data of 28 individuals in the KGP data that have both low and high sequence depth data (
<xref ref-type="fig" rid="pcbi.1006564.g001">Fig 1</xref>
shows a subset of eight individuals for clarity). If read depth is greater than 20, there is less dispersion of n/T values, most likely because more reads from the query WGS data are recovered from the mapped intervals. The states, ‘provirus’, ‘solo LTR’, and ‘absent’ are preliminarily assigned to each cluster based on the high depth data (data in
<xref ref-type="fig" rid="pcbi.1006564.g001">Fig 1B</xref>
used for description below). Individuals with n/T = 1 have the reference allele (represented by the yellow cluster of low depth data) and n/T = 0 (red cluster) indicates that the HERV-K is absent (no
<italic>k-mer</italic>
s to unique sites in the HERV-K at this locus were recovered from mapped sequence reads). The
<italic>k-mer</italic>
s derived from persons with low (green) and intermediate (blue) n/T values were mapped to the HERV-K reference for this locus to determine whether they localized only in the LTR (assign ‘solo LTR’ to green cluster) or in the coding region (assign ‘provirus’ to blue cluster) (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s004">S3 Fig</xref>
).</p>
<fig id="pcbi.1006564.g001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1006564.g001</object-id>
<label>Fig 1</label>
<caption>
<title>A mixture model to account for low depth WGS data.</title>
<p>A) Mixture model output on n/T values of 2535 individuals from KGP with low depth sequence data for chr12:55727215–55728183 when K = 70. At this value of k there are clusters representing low n/T values that are not well resolved and individual 25 and 14, which have the same status in high depth data, are assigned to different clusters. B) The result of the mixture model on the same data with k optimized to 50. The model returns four clusters each indicated by a unique color and eight of the 28 individuals that have both low and high depth sequence data are shown (see
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:KGP for identification). The n/T ratio is 1 for persons with high depth data [red numbers, #6 and 12] who have the reference allele, while the corresponding low depth data [black numbers, yellow cluster] from the same individuals have n/T ranging from 0.7 to 0.9. There is less of an effect of sequence depth for individuals who do not have the HERV-K (n/T = 0, red cluster, #23 and 28). However optimizing k improves separation of the solo LTR (green cluster; #4 and #16) from the blue cluster (#25 and #14), which represents a state where some unique k-mers in the set T are missing in the query data (this is likely an allele; see
<xref ref-type="supplementary-material" rid="pcbi.1006564.s004">S3 Fig</xref>
). States are confirmed by mapping the
<italic>k-mer</italic>
s from individuals in a cluster to the reference HERV-K at this locus (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s004">S3 Fig</xref>
).</p>
</caption>
<graphic xlink:href="pcbi.1006564.g001"></graphic>
</fig>
</sec>
<sec id="sec004">
<title>Prevalence of polymorphic HERV-K in each KGP super-population</title>
<p>The WGS data of each individual in the KGP dataset were evaluated using our optimized analysis workflow. HERV-Ks on chrY were not considered. Twenty sites, omitting one at chr1:73594980 [see
<xref ref-type="sec" rid="sec009">Methods</xref>
] that have been reported to be polymorphic for presence/absence [
<xref rid="pcbi.1006564.ref010" ref-type="bibr">10</xref>
,
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
,
<xref rid="pcbi.1006564.ref034" ref-type="bibr">34</xref>
,
<xref rid="pcbi.1006564.ref046" ref-type="bibr">46</xref>
] were identified as polymorphic for a HERV-K provirus by our analysis (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:virus). Polymorphic HERV-Ks greater than 6 kbp in length cluster together in a phylogenetic analysis indicating that they are closely related (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s005">S4 Fig</xref>
). The prevalence (proportion of individuals in a given population with a provirus present at a given locus) of the 20 polymorphic HERV-K proviruses varied from 0.9% to 99.5% when averaged across the entire KGP dataset (
<xref rid="pcbi.1006564.t001" ref-type="table">Table 1</xref>
). However, there were notable differences in prevalence at each HERV-K site among the five super-populations (AFR, EAS, AMR, EUR, SAS; see
<xref ref-type="sec" rid="sec009">Methods</xref>
for key to abbreviations). Of the 20, the prevalence of seven polymorphic HERV-Ks was greater than 90% and the difference between populations with the lowest and highest prevalence was less than 6.5% (
<xref rid="pcbi.1006564.t001" ref-type="table">Table 1</xref>
). There was 100% occupancy for six of the seven high prevalence polymorphic HERV-Ks (98.8% for the seventh), indicating that the rate of conversion to solo LTR is low for viruses at these sites (see
<xref ref-type="supplementary-material" rid="pcbi.1006564.s001">S1 Text</xref>
for occupancy and
<xref ref-type="supplementary-material" rid="pcbi.1006564.s011">S2 Dataset</xref>
:KGP(absence, solo, presence) for model prediction of solo LTR prevalence). Two polymorphic HERV-Ks had an overall prevalence of less than 10% in any population (
<xref rid="pcbi.1006564.t001" ref-type="table">Table 1</xref>
) and were found in individuals of AFR origin; we found no evidence of a solo LTR at these two sites. Nine of the remaining 11 HERV-Ks are of interest because the difference between super-populations with the highest and lowest prevalence is between 28 and 80 percentage points (
<xref rid="pcbi.1006564.t001" ref-type="table">Table 1</xref>
). Of note, for the three HERV-Ks with the largest difference among super-populations, the prevalence is lowest in EAS populations.</p>
<table-wrap id="pcbi.1006564.t001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1006564.t001</object-id>
<label>Table 1</label>
<caption>
<title>Provirus frequencies of polymorphic HERV-K.</title>
</caption>
<alternatives>
<graphic id="pcbi.1006564.t001g" xlink:href="pcbi.1006564.t001"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="justify" rowspan="1" colspan="1"></th>
<th align="justify" rowspan="1" colspan="1">KGP</th>
<th align="justify" rowspan="1" colspan="1">AFR</th>
<th align="justify" rowspan="1" colspan="1">AMR</th>
<th align="justify" rowspan="1" colspan="1">EAS</th>
<th align="justify" rowspan="1" colspan="1">EUR</th>
<th align="justify" rowspan="1" colspan="1">SAS</th>
</tr>
</thead>
<tbody>
<tr>
<td align="justify" rowspan="1" colspan="1">
<underline>
<bold>chr1:75842771</bold>
</underline>
<xref ref-type="table-fn" rid="t001fn005">
<sup>
<bold>c</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">42.88</td>
<td align="justify" rowspan="1" colspan="1">26.76</td>
<td align="justify" rowspan="1" colspan="1">56.53</td>
<td align="justify" rowspan="1" colspan="1">6.02</td>
<td align="justify" rowspan="1" colspan="1">68.91</td>
<td align="justify" rowspan="1" colspan="1">66.80</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<bold>chr3:112743479</bold>
<xref ref-type="table-fn" rid="t001fn003">
<sup>
<bold>a</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">98.46</td>
<td align="justify" rowspan="1" colspan="1">96.71</td>
<td align="justify" rowspan="1" colspan="1">99.72</td>
<td align="justify" rowspan="1" colspan="1">99.81</td>
<td align="justify" rowspan="1" colspan="1">99.60</td>
<td align="justify" rowspan="1" colspan="1">97.37</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<bold>chr3:148281477</bold>
</td>
<td align="justify" rowspan="1" colspan="1">41.89</td>
<td align="justify" rowspan="1" colspan="1">38.86</td>
<td align="justify" rowspan="1" colspan="1">42.61</td>
<td align="justify" rowspan="1" colspan="1">45.05</td>
<td align="justify" rowspan="1" colspan="1">46.53</td>
<td align="justify" rowspan="1" colspan="1">37.45</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<underline>
<bold>chr3:185280336</bold>
</underline>
<xref ref-type="table-fn" rid="t001fn003">
<sup>
<bold>a</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">99.49</td>
<td align="justify" rowspan="1" colspan="1">98.06</td>
<td align="justify" rowspan="1" colspan="1">100.00</td>
<td align="justify" rowspan="1" colspan="1">100.00</td>
<td align="justify" rowspan="1" colspan="1">100.00</td>
<td align="justify" rowspan="1" colspan="1">100.00</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<underline>
<bold>chr4:69463709</bold>
</underline>
<xref ref-type="table-fn" rid="t001fn005">
<sup>
<bold>c</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">72.50</td>
<td align="justify" rowspan="1" colspan="1">93.87</td>
<td align="justify" rowspan="1" colspan="1">88.92</td>
<td align="justify" rowspan="1" colspan="1">31.07</td>
<td align="justify" rowspan="1" colspan="1">85.35</td>
<td align="justify" rowspan="1" colspan="1">61.94</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<bold>chr5:156084717</bold>
<xref ref-type="table-fn" rid="t001fn003">
<sup>
<bold>a</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">99.41</td>
<td align="justify" rowspan="1" colspan="1">98.36</td>
<td align="justify" rowspan="1" colspan="1">99.72</td>
<td align="justify" rowspan="1" colspan="1">100.00</td>
<td align="justify" rowspan="1" colspan="1">99.80</td>
<td align="justify" rowspan="1" colspan="1">99.60</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<bold>chr6:57623896</bold>
<xref ref-type="table-fn" rid="t001fn003">
<sup>
<bold>a</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">93.65</td>
<td align="justify" rowspan="1" colspan="1">90.73</td>
<td align="justify" rowspan="1" colspan="1">97.16</td>
<td align="justify" rowspan="1" colspan="1">90.87</td>
<td align="justify" rowspan="1" colspan="1">97.23</td>
<td align="justify" rowspan="1" colspan="1">94.33</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<bold>chr6:78427019</bold>
<xref ref-type="table-fn" rid="t001fn003">
<sup>
<bold>a</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">97.71</td>
<td align="justify" rowspan="1" colspan="1">95.52</td>
<td align="justify" rowspan="1" colspan="1">97.16</td>
<td align="justify" rowspan="1" colspan="1">99.61</td>
<td align="justify" rowspan="1" colspan="1">97.23</td>
<td align="justify" rowspan="1" colspan="1">99.60</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<bold>chr7:4622057</bold>
<xref ref-type="table-fn" rid="t001fn002">*</xref>
<xref ref-type="table-fn" rid="t001fn005">
<sup>
<bold>c</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">47.50</td>
<td align="justify" rowspan="1" colspan="1">61.14</td>
<td align="justify" rowspan="1" colspan="1">30.11</td>
<td align="justify" rowspan="1" colspan="1">58.25</td>
<td align="justify" rowspan="1" colspan="1">36.44</td>
<td align="justify" rowspan="1" colspan="1">41.50</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<bold>chr8:12316492</bold>
<xref ref-type="table-fn" rid="t001fn005">
<sup>
<bold>c</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">14.08</td>
<td align="justify" rowspan="1" colspan="1">32.88</td>
<td align="justify" rowspan="1" colspan="1">12.22</td>
<td align="justify" rowspan="1" colspan="1">0</td>
<td align="justify" rowspan="1" colspan="1">15.64</td>
<td align="justify" rowspan="1" colspan="1">3.04</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<underline>
<bold>chr8:7355397</bold>
</underline>
<xref ref-type="table-fn" rid="t001fn005">
<sup>
<bold>c</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">18.66</td>
<td align="justify" rowspan="1" colspan="1">39.16</td>
<td align="justify" rowspan="1" colspan="1">12.50</td>
<td align="justify" rowspan="1" colspan="1">6.02</td>
<td align="justify" rowspan="1" colspan="1">11.29</td>
<td align="justify" rowspan="1" colspan="1">15.99</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<underline>
<bold>chr10:27182399</bold>
</underline>
<xref ref-type="table-fn" rid="t001fn003">
<sup>
<bold>a</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">99.13</td>
<td align="justify" rowspan="1" colspan="1">97.46</td>
<td align="justify" rowspan="1" colspan="1">99.43</td>
<td align="justify" rowspan="1" colspan="1">99.81</td>
<td align="justify" rowspan="1" colspan="1">99.80</td>
<td align="justify" rowspan="1" colspan="1">99.80</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<bold>chr11:101565794</bold>
<xref ref-type="table-fn" rid="t001fn005">
<sup>
<bold>c</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">63.04</td>
<td align="justify" rowspan="1" colspan="1">80.87</td>
<td align="justify" rowspan="1" colspan="1">77.27</td>
<td align="justify" rowspan="1" colspan="1">6.99</td>
<td align="justify" rowspan="1" colspan="1">86.53</td>
<td align="justify" rowspan="1" colspan="1">63.16</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<underline>
<bold>chr12:55727215</bold>
</underline>
</td>
<td align="justify" rowspan="1" colspan="1">72.19</td>
<td align="justify" rowspan="1" colspan="1">72.80</td>
<td align="justify" rowspan="1" colspan="1">80.40</td>
<td align="justify" rowspan="1" colspan="1">63.30</td>
<td align="justify" rowspan="1" colspan="1">80.99</td>
<td align="justify" rowspan="1" colspan="1">65.79</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<bold>chr12:58721242</bold>
<xref ref-type="table-fn" rid="t001fn005">
<sup>
<bold>c</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">70.73</td>
<td align="justify" rowspan="1" colspan="1">58.89</td>
<td align="justify" rowspan="1" colspan="1">78.41</td>
<td align="justify" rowspan="1" colspan="1">60.00</td>
<td align="justify" rowspan="1" colspan="1">87.33</td>
<td align="justify" rowspan="1" colspan="1">75.51</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<underline>
<bold>chr19:21841536</bold>
</underline>
<xref ref-type="table-fn" rid="t001fn005">
<sup>
<bold>c</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">26.98</td>
<td align="justify" rowspan="1" colspan="1">39.16</td>
<td align="justify" rowspan="1" colspan="1">11.93</td>
<td align="justify" rowspan="1" colspan="1">32.23</td>
<td align="justify" rowspan="1" colspan="1">10.69</td>
<td align="justify" rowspan="1" colspan="1">32.39</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<underline>
<bold>chr19:22414379</bold>
</underline>
<xref ref-type="table-fn" rid="t001fn005">
<sup>
<bold>c</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">67.77</td>
<td align="justify" rowspan="1" colspan="1">89.24</td>
<td align="justify" rowspan="1" colspan="1">60.80</td>
<td align="justify" rowspan="1" colspan="1">56.89</td>
<td align="justify" rowspan="1" colspan="1">55.84</td>
<td align="justify" rowspan="1" colspan="1">67.21</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<underline>
<bold>chr19:22457244</bold>
</underline>
<xref ref-type="table-fn" rid="t001fn004">
<sup>
<bold>b</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">0.87</td>
<td align="justify" rowspan="1" colspan="1">3.29</td>
<td align="justify" rowspan="1" colspan="1">0.00</td>
<td align="justify" rowspan="1" colspan="1">0.00</td>
<td align="justify" rowspan="1" colspan="1">0.00</td>
<td align="justify" rowspan="1" colspan="1">0.00</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<bold>chr22:18926187</bold>
<xref ref-type="table-fn" rid="t001fn003">
<sup>
<bold>a</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">99.49</td>
<td align="justify" rowspan="1" colspan="1">98.36</td>
<td align="justify" rowspan="1" colspan="1">99.72</td>
<td align="justify" rowspan="1" colspan="1">100.00</td>
<td align="justify" rowspan="1" colspan="1">99.80</td>
<td align="justify" rowspan="1" colspan="1">100.00</td>
</tr>
<tr>
<td align="justify" rowspan="1" colspan="1">
<underline>
<bold>chrX:93606603</bold>
</underline>
<xref ref-type="table-fn" rid="t001fn004">
<sup>
<bold>b</bold>
</sup>
</xref>
</td>
<td align="justify" rowspan="1" colspan="1">2.25</td>
<td align="justify" rowspan="1" colspan="1">7.32</td>
<td align="justify" rowspan="1" colspan="1">2.27</td>
<td align="justify" rowspan="1" colspan="1">0.00</td>
<td align="justify" rowspan="1" colspan="1">0.00</td>
<td align="justify" rowspan="1" colspan="1">0.00</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="t001fn001">
<p>For simplicity, only the starting coordinate is listed.</p>
</fn>
<fn id="t001fn002">
<p>* The value given represents individuals containing the tandem repeat found in hg19</p>
</fn>
<fn id="t001fn003">
<p>
<sup>a</sup>
: prevalence > 90%</p>
</fn>
<fn id="t001fn004">
<p>
<sup>b</sup>
: low prevalence HERV-K and no individuals with only a solo LTR</p>
</fn>
<fn id="t001fn005">
<p>
<sup>c</sup>
: max-min difference is > 28%</p>
</fn>
<fn id="t001fn006">
<p>underline: AFR significantly different from other 4 super populations.</p>
</fn>
<fn id="t001fn007">
<p>See
<xref ref-type="supplementary-material" rid="pcbi.1006564.s011">S2 Dataset</xref>
:compare_prevalence for full analysis of the data.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>Individuals from African populations differ significantly from the other four super-populations in the prevalence of ten of the polymorphic HERV-K, three of which occur in close proximity on chr19. (
<xref rid="pcbi.1006564.t001" ref-type="table">Table 1</xref>
,
<xref ref-type="supplementary-material" rid="pcbi.1006564.s011">S2 Dataset</xref>
:compare_prevalence). EUR and AFR super-populations are significantly different in the prevalence at all but one of the 20 polymorphic HERV-K based on adjusted p-values (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s011">S2 Dataset</xref>
:compare_prevalence).</p>
</sec>
<sec id="sec005">
<title>The number of polymorphic HERV-Ks per individual</title>
<p>The HERV-K genome is close to 10 kbp. As there are 20 known HERV-K loci with the potential to encode a provirus that are polymorphic in human populations, we asked if there is a difference in the burden of these repetitive, and potentially functional, viral elements among individuals. This was indeed the case. Of the 20 polymorphic HERV-K proviruses assessed, the number per person’s genome ranges from 7–18 (
<xref ref-type="fig" rid="pcbi.1006564.g002">Fig 2</xref>
,
<xref ref-type="supplementary-material" rid="pcbi.1006564.s011">S2 Dataset</xref>
:HERV-K per person). More than 63% of individuals from all super-populations except EAS carry 12 to 14 proviruses in their genome. Individuals from EAS have a lower burden with 69% of individuals carrying 9–11 of the 20 polymorphic HERV-K proviruses. 7% of AFR individuals have 16 or 17 proviruses compared to a maximum of 2% in other groups (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s011">S2 Dataset</xref>
:HERV-K per person). These data suggest that a comprehensive investigation of polymorphic HERV-Ks may be a more productive means to advance studies of their potential disease impact.</p>
<fig id="pcbi.1006564.g002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1006564.g002</object-id>
<label>Fig 2</label>
<caption>
<title>Histogram of the number of proviruses per individual from the KGP.</title>
<p>The number of the 20 known polymorphic HERV-K proviruses in individual from each of the five KGP super-populations, represented by indicated colors.</p>
</caption>
<graphic xlink:href="pcbi.1006564.g002"></graphic>
</fig>
</sec>
<sec id="sec006">
<title>Co-occurrence of polymorphic HERV-Ks</title>
<p>Our data provide a comprehensive picture of sites occupied by HERV-K provirus in each genome. Although most previous studies investigating a role of HERV-K in human disease assessed the prevalence of the HERV-K at a given locus, it is possible that, for example, two HERV-Ks each at 40% prevalence in a population rarely co-occur in an individual genome. By providing the status of all known polymorphic HERV-K in the genome, our tools facilitate such assessment and can advance investigation of HERV-K and human disease. We assessed combinations of three, four and five polymorphic HERV-Ks in KGP data and found that there are many combinations of co-occurring viruses that are population-specific (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s012">S3 Dataset</xref>
). To facilitate exploration of HERV-K combinations among KGP populations, we developed a D3.j visualization tool (see
<xref ref-type="sec" rid="sec009">Methods</xref>
) that allows a user to choose any combination of the 20 polymorphic HERV-K proviruses and display the co-occurrence prevalence among the 26 populations represented in the KGP data. As an example, we show a combination of four HERV-Ks to represent the variation that occurs in KGP individuals, which in this case ranges from 3% in EAS to 59% in EUR (
<xref ref-type="fig" rid="pcbi.1006564.g003">Fig 3A</xref>
). We also determine that the three polymorphic HERV-Ks found on chr19 co-occur only from three AFR populations and in less than 2% of individuals (
<xref ref-type="fig" rid="pcbi.1006564.g003">Fig 3B</xref>
).</p>
<fig id="pcbi.1006564.g003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1006564.g003</object-id>
<label>Fig 3</label>
<caption>
<title>A visualization tool to examine co-occurrence of polymorphic HERV-Ks.</title>
<p>A) The co-occurrence of polymorphic HERV-Ks at chr1:75842771–75849143, chr3:112743479–112752282, chr6:57623896–57628704, and chr12:58721242–58730698 in the 26 populations are represented based on their geographic location. The relative prevalence for these four co-occurring HERV-Ks in each population bubble is displayed based on the color gradient shown in the scale at the top. The actual prevalence of the given combination of HERV-K provirus for each population and the cumulative prevalence for each super-population are shown in text on the right. Note that AFR and EAS have the lowest prevalence of these four polymorphic HERV-Ks. B) As in (A) showing the co-occurrence of the three polymorphic HERV-Ks that are present on chr19 by population. This is a rare combination only found in two AFR populations and individuals in the Caribbean of African ancestry.</p>
</caption>
<graphic xlink:href="pcbi.1006564.g003"></graphic>
</fig>
</sec>
<sec id="sec007">
<title>KGP super-populations are distinguished by HERV-K status</title>
<p>Because there are clearly population-specific differences in both individual HERV-K prevalence and in the prevalence of HERV-K co-occurrence, we explored whether the presence or absence of these 20 documented polymorphic HERV-Ks is sufficient to distinguish populations using Fisher’s linear discriminant analysis (LDA) [
<xref rid="pcbi.1006564.ref047" ref-type="bibr">47</xref>
]. Based on the status ‘provirus’, ‘solo LTR’, or ‘absence’, there is little resolution of AFR, EUR, and EAS super-populations (
<xref ref-type="fig" rid="pcbi.1006564.g004">Fig 4A</xref>
). However, there is sufficient signature to separate AFR, EUR, and EAS if we utilize the n/T ratio of the 20 polymorphic HERV-Ks (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s006">S5 Fig</xref>
) and we further improve population separation if we use the n/T ratio for all 96 HERV-Ks (
<xref ref-type="fig" rid="pcbi.1006564.g004">Fig 4B</xref>
). This indicates that we are losing information by reducing the data to three states and that fixed HERV-K also contain signal for population of origin.</p>
<fig id="pcbi.1006564.g004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1006564.g004</object-id>
<label>Fig 4</label>
<caption>
<title>Linear discriminant analysis of HERV-K status among three super-populations.</title>
<p>A) LDA based on the states ‘provirus’, ‘solo LTR’ and ‘absence’ of the 20 polymorphic HERV-K for AFR, EAS, and EUR. AMR and SAS overlap these three populations and are removed for clarity B) LDA plot on n/T ratio of all 96 HERV-K separates AFR, EAS, and EUR super-populations. See
<xref ref-type="supplementary-material" rid="pcbi.1006564.s007">S6 Fig</xref>
for plots with all five super-populations.</p>
</caption>
<graphic xlink:href="pcbi.1006564.g004"></graphic>
</fig>
<p>An n/T = 1 indicates that the query set contains all
<italic>k-mers</italic>
that map to the reference set T for a specific HERV-K. If there is a HERV-K allele that has not been reported in any database but that is common in a population, we expect n/T <1 because we require 100% match to reference set T and
<italic>k-mers</italic>
covering allelic sites will be excluded (see
<xref ref-type="fig" rid="pcbi.1006564.g001">Fig 1B</xref>
, blue cluster for an example). We assessed the density distributions of n/T plots for each of the 96 HERV-Ks for evidence of population-specific alleles (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s001">S1 Text</xref>
,
<xref ref-type="supplementary-material" rid="pcbi.1006564.s008">S7 Fig</xref>
). Five HERV-Ks have some indication of population specific distributions (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:virus). The HERV-K at chr1:155596457–155605636, which we report as fixed, is notable because the reference allele (n/T = 1) is only found in AFR (
<xref ref-type="fig" rid="pcbi.1006564.g005">Fig 5A</xref>
,
<xref ref-type="supplementary-material" rid="pcbi.1006564.s008">S7 Fig</xref>
). Individuals from other populations have n/T near 0.5. We mapped
<italic>k-mer</italic>
s from individuals with n/T near 0.5 to the reference HERV-K sequences and confirmed that there is a loss of
<italic>k-mer</italic>
s at several sites covered by the unique reference
<italic>k-mer</italic>
s for this virus (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s009">S8 Fig</xref>
). There are also cases where the reference allele is found in all populations except AFR (
<xref ref-type="fig" rid="pcbi.1006564.g005">Fig 5B</xref>
and see
<xref ref-type="supplementary-material" rid="pcbi.1006564.s008">S7 Fig</xref>
for additional examples).</p>
<fig id="pcbi.1006564.g005" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1006564.g005</object-id>
<label>Fig 5</label>
<caption>
<title>Population specificity of HERV-K alleles.</title>
<p>A) n/T plot for HERV-K at chr1:155596457–155605636 colored by each of the 5 super-populations. Only individuals from AFR and a few from AMR have an n/T approximating 1 indicative of the HERV-K reference sequence. B) Plot of chr5:156084717–156093896 colored by each of the 5 super-populations. In this case, all populations except AFR have the reference allele and all super-populations have an alternative allele that is not present in our reference set.</p>
</caption>
<graphic xlink:href="pcbi.1006564.g005"></graphic>
</fig>
</sec>
</sec>
<sec sec-type="conclusions" id="sec008">
<title>Discussion</title>
<p>Our research provides a tool to mine whole genome sequence data to collectively evaluate the status of HERV-K provirus at known polymorphic and fixed sites in the human genome. The tool incorporates a statistical clustering algorithm to accommodate low depth sequence data and a visualization tool to explore the co-occurrence of known polymorphic HERV-K in the global populations represented in the KGP data. There are numerous significant differences in the prevalence of individual and co-occurring known polymorphic HERV-K among the five KGP super-populations. It is notable that individuals from EAS carry a lower total burden of the 20 polymorphic HERV-K than other represented populations. These data provide a comprehensive framework of genomic diversity among 20 documented polymorphic HERV-K proviruses to advance studies on potential roles for HERV-K in human disease, which have been alluring yet difficult to establish [
<xref rid="pcbi.1006564.ref021" ref-type="bibr">21</xref>
,
<xref rid="pcbi.1006564.ref022" ref-type="bibr">22</xref>
,
<xref rid="pcbi.1006564.ref024" ref-type="bibr">24</xref>
].</p>
<p>Tools developed to interrogate ERV insertional polymorphism typically exploit the unique signature created by the host-virus junction [
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
,
<xref rid="pcbi.1006564.ref048" ref-type="bibr">48</xref>
,
<xref rid="pcbi.1006564.ref049" ref-type="bibr">49</xref>
]. These approaches indicate that a site is occupied by an ERV but not whether there is a provirus associated with the site, which is more difficult to accomplish with short read sequence data. Our analysis tool provides an efficient means to detect occupancy and provirus status in one step. We decrease computational time by analyzing only the set of reads that map to existing HERV-K loci in the reference genome. This approach is justified because the known polymorphic HERV-K that are missing from the human reference are closely related to those in the reference genome assembly (see
<xref ref-type="supplementary-material" rid="pcbi.1006564.s005">S4 Fig</xref>
) and hence reads derived from them map to a related HERV-K in the reference. We employ
<italic>k-mer</italic>
counting methods, which also increase computational efficiency. A reference set of
<italic>k-mer</italic>
s that is unique to each HERV-K is generated for each location in the genome and the proportion of reads (n/T) from the query set that maps to the
<italic>k-mer</italic>
reference set is reported as a continuous variable; there is no threshold of read count or coverage imposed for classification. Instead we utilize a mixture model to statistically cluster values based on n/T and sequence depth and assign the same HERV-K status to all individuals in a cluster. Clusters representing n/T of 1 consist of individuals from whom all the unique
<italic>k-mer</italic>
s identified in the HERV-K reference set were recovered from their mapped WGS data. We classify other clusters by determining if
<italic>k-mer</italic>
s mapped on the reference allele are distributed at sites in the coding portion of the genome or only in the LTR; reads mapping only in the LTRs are classified as solo LTR. This approach demonstrated that the
<italic>k-mers</italic>
derived from some individuals only covered a subset of the unique sites and led to the interesting finding that several HERV-K loci could have population specific alleles.</p>
<p>Wildschutte
<italic>et al</italic>
[
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
] have conducted the most comprehensive study of HERV-K prevalence in the KGP data to date. The goal of that paper was to identify new polymorphic insertions, either provirus or solo LTR, based on detecting reads containing the host virus junction. However, they implemented an additional step to detect provirus and provide the prevalence of some polymorphic HERV-K provirus for comparison with our results (see
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:virus for comparison of prevalence values reported in Wildschutte
<italic>et al</italic>
[
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
]). There are five HERV-K previously reported in Subramanian
<italic>et al</italic>
2011 [
<xref rid="pcbi.1006564.ref010" ref-type="bibr">10</xref>
] that were not included in Wildschutte
<italic>et al</italic>
[
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
]; all are polymorphic in our analysis (range 43–99%, see
<xref rid="pcbi.1006564.t001" ref-type="table">Table 1</xref>
and
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:virus-column N). Seven polymorphic HERV-K, which Wildschutte
<italic>et al</italic>
[
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
] indicate occur in greater than 98% of KGP individuals, are fixed in our study. Our estimated prevalence for 14 HERV-K differs from that reported in Wildschutte
<italic>et al</italic>
[
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
] by 5% or more. Of these 14, the prevalence estimates at chr1:155596457–155605636 are most divergent. Our data show this site is fixed for provirus and Wildschutte
<italic>et al</italic>
[
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
] report that only 14% of the KGP data, all from AFR, have a HERV-K provirus integration. Our plots for chr1:155596457–155605636 show that AFR individuals carry the reference allele at this site (n/T near 1,
<xref ref-type="fig" rid="pcbi.1006564.g005">Fig 5A</xref>
) and all other individuals have n/T near 0.5. The
<italic>k-mer</italic>
s from individuals with low n/T values for chr1:155596457–155605636 map to only a subset of sites marked by unique
<italic>k-mer</italic>
s in the coding region (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s009">S8 Fig</xref>
), which is consistent with sequence polymorphism or a deletion at these positions. The reference set T is small for this HERV-K and therefore overall coverage of the genome is low. Because Wildschutte
<italic>et al</italic>
[
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
] used a minimum coverage threshold for their
<italic>k-mer</italic>
mapping method, it is possible that alleles present in non-AFR populations do not meet their inclusion criteria. There is a similar signal for alleles, represented by lower n/T values, at the other 13 HERV-K sites although the differences between our prevalence estimates and those of Wildschutte
<italic>et al</italic>
[
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
] are small (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:virus). In most cases these putative alleles are found in all populations at different frequencies but in five there is some degree of population specificity (
<xref ref-type="fig" rid="pcbi.1006564.g005">Fig 5</xref>
,
<xref ref-type="supplementary-material" rid="pcbi.1006564.s008">S7 Fig</xref>
,
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:virus). Our results indicate that there could be considerably more sequence variation in HERV-K among human populations than previously appreciated. These data also suggest that using a HERV-K consensus sequence to study pathogenic potential could miss important features of HERV-K proviral polymorphism, which can be characterized by both the site occupancy status (presence/absence) and, when present, by sequence differences among individuals.</p>
<p>HERV-Ks are the youngest family of endogenous retroviruses in humans and consequently they share considerable sequence identity. This has the effect of limiting the number of unique sites associated with some HERV-K, which decreases the size of the reference set T (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:virus). The set T is small for near identical HERV-K such as HERV-Ks involved in a duplication event. The HERV-Ks at chr1:13458305–13467826 and chr1:13678850–13688242 are identical and cannot be distinguished. We report n/T for only one of these HERV-K (see
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:virus, column M). We treat the two HERV-K proviruses spanning chr7:4622057–4640031 as a single virus with n/T = 1 reflecting the tandem arrangement found in the hg19. In this case, n/T<1 can mean either that both proviruses are present but with substitutions at a unique
<italic>k-mer</italic>
site or that one provirus converted to a solo LTR. Thus although an n/T ratio of 0 or 1 reliably indicates absence and presence of reference HERV-Ks, respectively, when T is small, sequence polymorphism and a deletion event can be difficult to distinguish from a solo LTR. However, because our mixture model statistically clusters similar n/T values based on sequence depth, all individuals in a cluster have the same status (e.g allele or solo LTR) even if we do not know what that state is. The ability of our tools to resolve the status of closely related HERV-K provirus sequences will improve as more empirical sequence data becomes available.</p>
<p>Our approach provides researchers with a rapid means to determine if the prevalence, and overall burden of the 96 HERV-K proviruses evaluated differ between a patient data set and the population represented in KGP to which they trace ancestry. The visualization tool will facilitate investigation of combinations of HERV-Ks in certain clinical conditions. The potential that HERV-K has multiple allelic forms in different populations is worthy of further analysis because a sequence allele could also contribute to a disease condition.</p>
</sec>
<sec sec-type="materials|methods" id="sec009">
<title>Materials and methods</title>
<sec id="sec010">
<title>HERV-K proviruses</title>
<p>The 96 HERV-K proviruses previously reported [
<xref rid="pcbi.1006564.ref010" ref-type="bibr">10</xref>
,
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
,
<xref rid="pcbi.1006564.ref034" ref-type="bibr">34</xref>
,
<xref rid="pcbi.1006564.ref046" ref-type="bibr">46</xref>
] were supplemented with HERV-K alleles present in the NCBI nt database (November 2016 release) (92 in hg19, and 4 from the NCBI nt database). We required that any allele of a HERV-K from the nt database have at least 2kb of hg19 reference-matching host flanking sequence to confirm genome location. In total, 234 alleles were collected at the 96 known HERV-K loci. The location information and virus features are summarized in
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
: virus.</p>
</sec>
<sec id="sec011">
<title>Developing a
<italic>k-mer</italic>
based detection model</title>
<p>We identified the
<italic>k-mer</italic>
s that correspond to unique sequence characterizing each HERV-K.
<italic>K-mer</italic>
s are substrings (subsequences) of length
<italic>k</italic>
that exist in a string (DNA sequence). The length
<italic>k</italic>
is determined empirically (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s001">S1 Text</xref>
). Each
<italic>k-mer</italic>
is labeled with the corresponding viruses in which it is observed.</p>
<p>Only those
<italic>k-mer</italic>
s referring to a single virus locus, unique
<italic>k-mer</italic>
s, are selected for the set T. Where multiple alleles of a HERV-K are available,
<italic>k-mer</italic>
s unique to all alleles at that location comprise T. Multiple 2bps different
<italic>k-mer</italic>
s (such as SNPs) corresponding to the same location on the virus, are merged into a single entry for the purposes of computing T. We map unique
<italic>k-mer</italic>
s back to the corresponding alleles to determine coverage of the HERV-K and whether
<italic>k-mer</italic>
s are located in LTRs (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s004">S3 Fig</xref>
;
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
: virus).</p>
</sec>
<sec id="sec012">
<title>Analysis of 1000 genome project (KGP) data</title>
<p>To develop a method to recover sequences containing information on HERV-K we leverage the fact that HERV-Ks are closely related. Thus, most sequence reads obtained from an individual with a polymorphic HERV-K that is absent in the human reference, hg19, will map to the location of a closely related HERV-K that is present the human genome reference. (As we show in
<xref ref-type="supplementary-material" rid="pcbi.1006564.s005">S4 Fig</xref>
, the known polymorphic HERV-K proviruses are closely related.) A file with the coordinates for all reported HERV-K insertions is used to extract mapped reads from a genome sequence file (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:bed, which provides the coordinates for both hg19 and hg38). Note that the KGP data were mapped to GRCh37, which includes the decoy sequence hs37d5. This decoy contains the HERV-K at chr1:73594980_73595948, which is not present in hg19. Thus, we did not recover any reads for this HERV-K, which is polymorphic but reportedly at high prevalence in most populations [
<xref rid="pcbi.1006564.ref011" ref-type="bibr">11</xref>
].</p>
<p>The KGP data were downloaded in aligned Binary Alignment/Map (BAM) format (
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/data/">ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/data/</ext-link>
). It contains data for 2,535 individuals (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:KGP) sequenced via low-depth whole-genome sequencing (mean depth = 6.98X). The individuals represent 26 populations, derived from 5 super-populations, including African (AFR), Admixed America (AMR), East Asian (EAS), European (EUR), and South Asian (SAS) [
<xref rid="pcbi.1006564.ref050" ref-type="bibr">50</xref>
,
<xref rid="pcbi.1006564.ref051" ref-type="bibr">51</xref>
]. Of 2,535 individuals, 28 also have high-depth DNA sequences (mean depth = 48.06X), which we use as a pilot dataset to develop the mixture model, described below and in Supplementary Text.</p>
<p>Our computational framework to indicate the status of each known HERV-K provirus is based on the n/T ratio, which is the proportion of
<italic>k-mer</italic>
s in the data mined from WGS of each individual that are identical to the reference set T for each HERV-K provirus. Sequence reads are extracted from a mapped file of whole human genome sequence data based on coordinates corresponding to each annotated HERV-K. The reads are k-merized and mapped to the set T, which represents all unique
<italic>k-mer</italic>
s assigned to each HERV-K in the reference set. We use exact match to map the
<italic>k-mer</italic>
data set to the unique
<italic>k-mer</italic>
references. The n/T ratio is an indicator of the presence of each HERV-K; n/T = 1 indicates that the individual has the HERV-K in our reference dataset documented to be at that locus while n/T = 0 indicates that no
<italic>k-mers</italic>
unique to a HERV-K locus were recovered (see
<xref ref-type="fig" rid="pcbi.1006564.g001">Fig 1</xref>
for more explanation). Using a hash table (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s001">S1 Text</xref>
), it takes 15 minutes to generate the n/T matrix for 100 files. The source code for the entire process is at
<ext-link ext-link-type="uri" xlink:href="https://github.com/lwl1112/polymorphicHERV">https://github.com/lwl1112/polymorphicHERV</ext-link>
</p>
</sec>
<sec id="sec013">
<title>Dirichlet process Gaussian mixture model (DPGMM)</title>
<p>We utilized a statistical model to account for the dependency of the number of
<italic>k-mer</italic>
s obtained from a person’s sequence data (denoted by
<italic>n</italic>
<sub>
<italic>ik</italic>
</sub>
for the
<italic>i</italic>
th subject and
<italic>k</italic>
th HERV-K, with
<italic>i</italic>
= 1,…,
<italic>I</italic>
,
<italic>k</italic>
= 1,…,96) that maps to the reference set T for each HERV-K on sequencing depth. Thus for each HERV-K we could statistically cluster those
<italic>n</italic>
<sub>
<italic>ik</italic>
</sub>
/
<italic>T</italic>
values for
<italic>i</italic>
= 1,…,
<italic>I</italic>
based on the sequence depth of the WGS data for each individual for subsequent biological classification (provirus, solo LTR, absence, see
<xref ref-type="fig" rid="pcbi.1006564.g001">Fig 1</xref>
). More specifically in our analysis, for each
<italic>k</italic>
HERV-K,
<italic>k</italic>
= 1,…,96, consider a sample of size
<italic>I</italic>
measurements
<italic>x</italic>
<sub>i</sub>
(
<italic>i</italic>
= 1:
<italic>I</italic>
), where each
<italic>x</italic>
<sub>i</sub>
is a vector of length 2
<italic>x</italic>
<sub>
<italic>i</italic>
</sub>
= (
<italic>x</italic>
<sub>i1</sub>
,
<italic>x</italic>
<sub>i2</sub>
) with
<italic>x</italic>
<sub>i1</sub>
being the
<italic>n</italic>
<sub>
<italic>ik</italic>
</sub>
/
<italic>T</italic>
measurement and
<italic>x</italic>
<sub>i2</sub>
the log function of depth. Here, for notation simplification, we use
<italic>x</italic>
<sub>i</sub>
instead of
<italic>x</italic>
<sub>ik</sub>
. To perform clustering analysis, we utilize the mixture model approach, which is arguably the most widely used statistical method for clustering. Specifically, we follow the work proposed by Lin et al. [
<xref rid="pcbi.1006564.ref052" ref-type="bibr">52</xref>
] that employs a Gaussian Mixture Model (GMM) with density function given by
<disp-formula id="pcbi.1006564.e001">
<alternatives>
<graphic xlink:href="pcbi.1006564.e001.jpg" id="pcbi.1006564.e001g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M1">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi mathvariant="normal">i</mml:mi>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:mi>θ</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mstyle displaystyle="false">
<mml:mo></mml:mo>
</mml:mstyle>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>M</mml:mi>
</mml:msubsup>
<mml:msub>
<mml:mi>π</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mspace width="0.12em"></mml:mspace>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi mathvariant="normal">μ</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>Σ</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
<mml:mspace width="1em"></mml:mspace>
<mml:mi mathvariant="normal">f</mml:mi>
<mml:mi mathvariant="normal">o</mml:mi>
<mml:mi mathvariant="normal">r</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
<label>(1)</label>
</disp-formula>
where all relevant and needed (unknown) parameters are represented by
<italic>θ</italic>
= (π
<sub>{1:M},</sub>
μ
<sub>{1:M}</sub>
,
<italic>Σ</italic>
<sub>{1:M}</sub>
).
<italic>N</italic>
<sub>
<italic>j</italic>
</sub>
,
<italic>Σ</italic>
<sub>
<italic>j</italic>
</sub>
) is the Gaussian density for the jth component parameterized by the 2-dimensional mean vector μ
<sub>
<italic>j</italic>
</sub>
and 2x2 covariance matrix
<italic>Σ</italic>
<sub>
<italic>j</italic>
</sub>
. π
<sub>{1:M}</sub>
are the mixture components prior probabilities summing to 1. To allow a flexible modeling approach, we employ the standard Bayesian (truncated) Dirichlet Process prior for the parameters
<italic>θ</italic>
= (
<italic>π</italic>
<sub>
<italic>j</italic>
</sub>
, μ
<sub>
<italic>j</italic>
</sub>
,
<italic>Σ</italic>
<sub>
<italic>j</italic>
</sub>
,
<italic>j</italic>
= 1:
<italic>M</italic>
) [
<xref rid="pcbi.1006564.ref053" ref-type="bibr">53</xref>
,
<xref rid="pcbi.1006564.ref054" ref-type="bibr">54</xref>
]. The idea is that some of the mixture probabilities (
<italic>π</italic>
<sub>
<italic>j</italic>
</sub>
) can be zero, hence the actual number of mixture components needed may be smaller than the upper bound M. This mechanism allows automatic determination of the number of mixture components needed by the data set at hand. For model estimation, a latent indicator
<italic>Z</italic>
<sub>
<italic>i</italic>
</sub>
∈{1,2,…,
<italic>M</italic>
} with
<italic>P</italic>
(
<italic>Z</italic>
<sub>
<italic>i</italic>
</sub>
=
<italic>j</italic>
) =
<italic>π</italic>
<sub>
<italic>j</italic>
</sub>
is used, for
<italic>i</italic>
= 1:
<italic>I</italic>
. Specifically,
<italic>Z</italic>
<sub>
<italic>i</italic>
</sub>
=
<italic>j</italic>
if, and only if,
<italic>x</italic>
<sub>i</sub>
comes from component
<italic>j</italic>
. Given a fitted model via the Bayesian expectation–maximization algorithm, in terms of estimates of all parameters
<italic>θ</italic>
, instead of interpreting the fitted Gaussian mixture components as clusters, we identify clusters by aggregating Gaussian components so that non-Gaussian type of clusters can be flexibly represented. Merging components into clusters can be done by associating each of the Gaussian components to the closest mode of
<italic>f</italic>
(
<italic>x</italic>
<sub>1:
<italic>I</italic>
</sub>
|
<italic>θ</italic>
) = ∏
<sub>
<italic>i</italic>
= 1:
<italic>I</italic>
</sub>
<italic>f</italic>
(
<italic>x</italic>
<sub>
<italic>i</italic>
</sub>
|
<italic>θ</italic>
). Hence, the number of modes identified is the realized number of clusters. [
<xref ref-type="supplementary-material" rid="pcbi.1006564.s001">S1 Text</xref>
for additional detail]</p>
</sec>
<sec id="sec014">
<title>Co-occurrence of polymorphic HERV-K</title>
<p>We consider that both the individual prevalence of a HERV-K and the co-occurrence of multiple HERV-Ks could differ among populations.</p>
<p>The time of a brute-force approach for finding all combinations
<italic>C</italic>
<sub>
<italic>m</italic>
</sub>
of size m from p polymorphic HERV-K is
<inline-formula id="pcbi.1006564.e002">
<alternatives>
<graphic xlink:href="pcbi.1006564.e002.jpg" id="pcbi.1006564.e002g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M2">
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mo stretchy="true"></mml:mo>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac linethickness="0pt">
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:math>
</alternatives>
</inline-formula>
, which is not efficient and is redundant. We employed the Apriori algorithm [
<xref rid="pcbi.1006564.ref055" ref-type="bibr">55</xref>
], which is commonly used for finding frequent pattern sets; in our case indicating which of the known polymorphic HERV-K frequently appear together. It first generates combinations C
<sub>m</sub>
(initialized to 1). In the optimization, frequent combinations F
<sub>m</sub>
are returned from candidates C
<sub>m</sub>
when prevalence exceeds the minimum threshold of co-occurrence. F
<sub>m</sub>
are then self-joined to generate combinations C
<sub>m+1</sub>
of size
<italic>m</italic>
+1 and out of which F
<sub>m+1</sub>
satisfy the minimum co-occurrence. In each pass, candidate combinations are pruned so as to avoid generating all combinations, which reduces running time significantly.</p>
</sec>
<sec id="sec015">
<title>Statistical analysis of HERV-K frequencies across populations</title>
<p>We made statistical comparisons across 5 super-populations for the following three problems. For each problem, there are
<inline-formula id="pcbi.1006564.e003">
<alternatives>
<graphic xlink:href="pcbi.1006564.e003.jpg" id="pcbi.1006564.e003g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M3">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mfrac linethickness="0pt">
<mml:mrow>
<mml:mn>5</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
= 10 families of 1-to-1 comparisons conducted. The ‘prop-test’ function in R is used to test whether the proportions for two super-populations are the same.</p>
<list list-type="order">
<list-item>
<p>individual prevalence of polymorphic HERV-K. (20 comparisons for each polymorphic HERV-K in a family)</p>
</list-item>
<list-item>
<p>The number of polymorphic HERV-K present per individual. (21 comparisons as the number of co-occurring polymorphic HERV-K is from 0 to 20)</p>
</list-item>
<list-item>
<p>The co-occurrence for combinations of polymorphic HERV-K.</p>
</list-item>
</list>
<p>Therefore, multiple hypotheses would be conducted on frequencies
<italic>F</italic>
across super-populations
<italic>P</italic>
<sub>1…5</sub>
as follows:</p>
<p>Null hypothesis,
<inline-formula id="pcbi.1006564.e004">
<alternatives>
<graphic xlink:href="pcbi.1006564.e004.jpg" id="pcbi.1006564.e004g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M4">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</alternatives>
</inline-formula>
, where i≠j;</p>
<p>Alternative hypothesis,
<inline-formula id="pcbi.1006564.e005">
<alternatives>
<graphic xlink:href="pcbi.1006564.e005.jpg" id="pcbi.1006564.e005g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M5">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">H</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</alternatives>
</inline-formula>
, where i≠j.</p>
<p>A separate P-value is computed for each test and the Benjamini-Hochberg procedure [
<xref rid="pcbi.1006564.ref056" ref-type="bibr">56</xref>
] is used to account for multiple comparisons.</p>
</sec>
<sec id="sec016">
<title>Visualization in D3.js</title>
<p>We utilized D3.js (Data Driven Documents) [
<xref rid="pcbi.1006564.ref057" ref-type="bibr">57</xref>
], an open-source java script library to create an interactive visualization to display co-occurrence of polymorphic HERV-Ks in human populations. Our visualization system includes two modules, a welcome page and a result page. Input JSON data include locations of polymorphic HERV-K, population information, and the 0/1 (absence / presence) matrix. (See
<xref ref-type="supplementary-material" rid="pcbi.1006564.s001">S1 Text</xref>
). Source code is available at:
<ext-link ext-link-type="uri" xlink:href="https://github.com/lwl1112/polymorphicHERV/tree/master/visualization">https://github.com/lwl1112/polymorphicHERV/tree/master/visualization</ext-link>
and a searchable tool with the data reported here is at:
<ext-link ext-link-type="uri" xlink:href="http://pages.iu.edu/~wli6/visualization/">http://pages.iu.edu/~wli6/visualization/</ext-link>
</p>
</sec>
</sec>
<sec sec-type="supplementary-material" id="sec017">
<title>Supporting information</title>
<supplementary-material content-type="local-data" id="pcbi.1006564.s001">
<label>S1 Text</label>
<caption>
<title>This file contains methods, table of site occupancy, and references cited in methods.</title>
<p>(DOCX)</p>
</caption>
<media xlink:href="pcbi.1006564.s001.docx">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1006564.s002">
<label>S1 Fig</label>
<caption>
<title>The distribution of n/T values for chr12:55727215–55728183 when k = 70.</title>
<p>The x-axis is the n/T ratio, representing the proportion of k-mers derived from an individual’s genome data that matches the unique set T for the HERV-K at chr12:55727215–55728183. The y- axis represents sequence depth. Under these conditions, there is a tendency for clustering of some values but dispersion of points is broad and separation into biologically meaningful clusters would be difficult. For this reason, we developed the mixture model after optimizing the length k to facilitate clustering (
<xref ref-type="supplementary-material" rid="pcbi.1006564.s003">S2 Fig</xref>
and
<xref ref-type="supplementary-material" rid="pcbi.1006564.s001">S1 Text</xref>
).</p>
<p>(TIFF)</p>
</caption>
<media xlink:href="pcbi.1006564.s002.tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1006564.s003">
<label>S2 Fig</label>
<caption>
<title>Effect of
<italic>k</italic>
on n/T.</title>
<p>Six individuals with both high and low depth data are used to demonstrate how varying the length of k affects n/T values for absent, solo LTR and present states. High depth data is above the line (depth = 20). Different colors represent different values of
<italic>k</italic>
from 30–70 as shown in the legend. Each number represents a different individual (see
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
:KGP for the identify of the sample corresponding to each number).</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pcbi.1006564.s003.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1006564.s004">
<label>S3 Fig</label>
<caption>
<title>Alignment of unique k-mers to HERV-K at chr12: 55727215.</title>
<p>All k-mers derived from the data mining step from each individual are mapped to the reference set of unique k-mers, T, requiring 100% identity, to generate the set ‘n’ The first row shows the coverage of the set
<italic>T</italic>
on the HERV-K. The following plots show the mapping of the k-mer set ‘n’ from 8 individuals for the HERV-K at chr12: 55727215. # 6, 12, 14, and 25 (see
<xref ref-type="supplementary-material" rid="pcbi.1006564.s010">S1 Dataset</xref>
: KGP, column D for identification information) are labeled as ‘provirus’. Note the drop out of the peaks near 3500 and 5000bp for #14 and #25, which accounts for a decrease in n/T in these individuals. #4 and 16 have low n/T and k-mers map to the LTR region indicated above the diagram; these are labeled as ‘solo LTR’. #23, and 28 are labeled as ‘absent’. For individuals with states ‘solo LTR’ and ‘absent’, there are some peaks in the coding region. This is most likely the result of assigning unique k-mers to this HERV-K that are shared with those from a HERV-K that is absent from the reference HERV-K dataset.</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pcbi.1006564.s004.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1006564.s005">
<label>S4 Fig</label>
<caption>
<title>Maximum likelihood phylogenetic tree of fixed and polymorphic HERV-K.</title>
<p>To improve the alignment, only > = 6,500 bp HERV-Ks were included except for the HERV-K at chr1:75,842,771, which has a long deletion but aligns well in other regions. Maximum likelihood tree was generated using PhyML [
<xref rid="pcbi.1006564.ref004" ref-type="bibr">4</xref>
] using GTR with a gamma distribution. Node support was calculated using the alpha likelihood ratio test. Nodes with less than 0.9 alpha likelihood ratio test support were collapsed and colored in grey. HERV-K taxa are named after their genomic location in hg19. Polymorphic HERV-Ks identified in this study are indicated in red text. The chr8:146086169 HERV-K was identified in one individual in Wildschutte
<italic>et al</italic>
[
<xref rid="pcbi.1006564.ref005" ref-type="bibr">5</xref>
] but not found in this analysis.</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pcbi.1006564.s005.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1006564.s006">
<label>S5 Fig</label>
<caption>
<title>Linear discriminant analysis (LDA) based on n/T ratio of the 20 polymorphic HERV-Ks.</title>
<p>There is improved resolution of EAS from EUR and AFR using n/T compared to reducing the data to the three states ‘provirus’, ‘solo LTR’, ‘absent’ (
<xref ref-type="fig" rid="pcbi.1006564.g004">Fig 4</xref>
) for these 20 HERV-Ks. However, there is still substantial overlap of EUR and AFR based on n/T of the 20 polymorphic HERV-K studied.</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pcbi.1006564.s006.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1006564.s007">
<label>S6 Fig</label>
<caption>
<title>Linear discriminant analysis (LDA) using the five super populations.</title>
<p>A) LDA plot based on the states ‘provirus’, ‘solo LTR’ and ‘absence’ of the 20 polymorphic HERV-Ks for the 5 super-populations represented in KGP. AMR are largely interspersed between AFR and EUR and SAS are found between EUR and EAS based on polymorphic status alone. B) LDA plot based on the n/T for all HERV-K proviruses for 5 super-populations. AMR and SAS overlap with EUR but are better separated from AFR based on these data.</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pcbi.1006564.s007.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1006564.s008">
<label>S7 Fig</label>
<caption>
<title>Kernel density estimation for 12 representative polymorphic HERV-Ks.</title>
<p>We assessed the density plots of all 96 HERV-K to determine if any peaks were specific to one of the super-populations. Shown are examples of candidate alleles specific to a population. In others several or all populations have the alleles but the prevalence is skewed. For example, the candidate allele for chr3:112743479–112752282 (the peak near n/T~0.7) appears to be more common in SAS individuals (pink trace). Similarly, EAS individuals (green trace) have a lower prevalence of the chr12:58721242–58730698 reference allele (n/T peak near 1) than do EUR (blue trace). Population-specific variation in HERV-K sequence could lead to under-estimation of proviral prevalence with mapping methods that require a coverage threshold.</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pcbi.1006564.s008.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1006564.s009">
<label>S8 Fig</label>
<caption>
<title>Mapping four high-depth KGP individuals to the reference allele of chr1:155596457–155605636.</title>
<p>The first row shows the positions where unique
<italic>k-mer</italic>
set T map to the reference HERV-K at chr1:155596457. The following rows show the mapping of
<italic>k-mers</italic>
recovered from four high-depth individuals: the
<italic>n/T</italic>
ratio for # 21 & 22 is equal to or close to 1; for # 20 & 23 the n/T ratio is between 0.5 and 0.7, representing a candidate allele at this locus. Note the loss of peaks at 1700bp and 3200bp in both individuals #20 and 23 and of the peak at 4700bp in #23.</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pcbi.1006564.s009.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1006564.s010">
<label>S1 Dataset</label>
<caption>
<title>Information on HERV-K, bed files for data mining, 1000 genomes data.</title>
<p>(XLSX)</p>
</caption>
<media xlink:href="pcbi.1006564.s010.xlsx">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1006564.s011">
<label>S2 Dataset</label>
<caption>
<title>Results from analysis including matrices of n/T, presence or absence, and analysis of population prevalence and total number of HERV-K per individual.</title>
<p>(XLSX)</p>
</caption>
<media xlink:href="pcbi.1006564.s011.xlsx">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pcbi.1006564.s012">
<label>S3 Dataset</label>
<caption>
<title>Analysis of co-occurrence for 3, 4, and 5 HERV-K.</title>
<p>(XLSX)</p>
</caption>
<media xlink:href="pcbi.1006564.s012.xlsx">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>We thank three anonymous reviewers for useful comments that improved the manuscript.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pcbi.1006564.ref001">
<label>1</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hayward</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Grabherr</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Jern</surname>
<given-names>P</given-names>
</name>
.
<article-title>Broad-scale phylogenomics provides insights into retrovirus-host evolution</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
.
<year>2013</year>
;
<volume>110</volume>
:
<fpage>20146</fpage>
<lpage>51</lpage>
.
<pub-id pub-id-type="doi">10.1073/pnas.1315419110</pub-id>
<pub-id pub-id-type="pmid">24277832</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref002">
<label>2</label>
<mixed-citation publication-type="journal">
<name>
<surname>Feschotte</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Gilbert</surname>
<given-names>C</given-names>
</name>
.
<article-title>Endogenous viruses: insights into viral evolution and impact on host biology</article-title>
.
<source>Nat Rev Genet</source>
. Nature Publishing Group;
<year>2012</year>
;
<volume>13</volume>
:
<fpage>283</fpage>
<lpage>296</lpage>
.
<pub-id pub-id-type="doi">10.1038/nrg3199</pub-id>
<pub-id pub-id-type="pmid">22421730</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref003">
<label>3</label>
<mixed-citation publication-type="journal">
<name>
<surname>Stoye</surname>
<given-names>JP</given-names>
</name>
.
<article-title>Studies of endogenous retroviruses reveal a continuing evolutionary saga</article-title>
.
<source>Nat Rev Microbiol</source>
. Nature Publishing Group;
<year>2012</year>
;
<volume>10</volume>
:
<fpage>395</fpage>
<lpage>406</lpage>
.
<pub-id pub-id-type="doi">10.1038/nrmicro2783</pub-id>
<pub-id pub-id-type="pmid">22565131</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref004">
<label>4</label>
<mixed-citation publication-type="journal">
<name>
<surname>Gifford</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Tristem</surname>
<given-names>M</given-names>
</name>
.
<article-title>The evolution, distribution and diversity of endogenous retroviruses</article-title>
.
<source>Virus Genes</source>
. Springer;
<year>2003</year>
;
<volume>26</volume>
:
<fpage>291</fpage>
<lpage>315</lpage>
.
<pub-id pub-id-type="pmid">12876457</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref005">
<label>5</label>
<mixed-citation publication-type="journal">
<name>
<surname>Weiss</surname>
<given-names>RA</given-names>
</name>
.
<article-title>The discovery of endogenous retroviruses</article-title>
.
<source>Retrovirology</source>
.
<year>2006</year>
<pub-id pub-id-type="doi">10.1186/1742-4690-3-67</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref006">
<label>6</label>
<mixed-citation publication-type="journal">
<name>
<surname>Jern</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Coffin</surname>
<given-names>JM</given-names>
</name>
.
<article-title>Effects of retroviruses on host genome function</article-title>
.
<source>Annu Rev Genet</source>
.
<year>2008</year>
;
<volume>42</volume>
:
<fpage>709</fpage>
<lpage>32</lpage>
.
<pub-id pub-id-type="doi">10.1146/annurev.genet.42.110807.091501</pub-id>
<pub-id pub-id-type="pmid">18694346</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref007">
<label>7</label>
<mixed-citation publication-type="journal">
<name>
<surname>Löwer</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Löwer</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Kurth</surname>
<given-names>R</given-names>
</name>
.
<article-title>The viruses in all of us: characteristics and biological significance of human endogenous retrovirus sequences</article-title>
.
<source>Proc Natl Acad Sci</source>
. National Acad Sciences;
<year>1996</year>
;
<volume>93</volume>
:
<fpage>5177</fpage>
<lpage>5184</lpage>
.
<pub-id pub-id-type="pmid">8643549</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref008">
<label>8</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bannert</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Kurth</surname>
<given-names>R</given-names>
</name>
.
<article-title>Retroelements and the human genome: New perspectives on an old relation</article-title>
.
<source>Proc Natl Acad Sci</source>
.
<year>2004</year>
;
<volume>101</volume>
:
<fpage>14572</fpage>
<lpage>14579</lpage>
.
<pub-id pub-id-type="doi">10.1073/pnas.0404838101</pub-id>
<pub-id pub-id-type="pmid">15310846</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref009">
<label>9</label>
<mixed-citation publication-type="journal">
<name>
<surname>Moyes</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Griffiths</surname>
<given-names>DJ</given-names>
</name>
,
<name>
<surname>Venables</surname>
<given-names>PJ</given-names>
</name>
.
<article-title>Insertional polymorphisms: a new lease of life for endogenous retroviruses in human disease</article-title>
.
<source>Trends Genet</source>
.
<year>2007</year>
;
<volume>23</volume>
:
<fpage>326</fpage>
<lpage>333</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.tig.2007.05.004</pub-id>
<pub-id pub-id-type="pmid">17524519</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref010">
<label>10</label>
<mixed-citation publication-type="journal">
<name>
<surname>Subramanian</surname>
<given-names>RP</given-names>
</name>
,
<name>
<surname>Wildschutte</surname>
<given-names>JH</given-names>
</name>
,
<name>
<surname>Russo</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Coffin</surname>
<given-names>JM</given-names>
</name>
.
<article-title>Identification, characterization, and comparative genomic distribution of the HERV-K (HML-2) group of human endogenous retroviruses</article-title>
.
<source>Retrovirology</source>
. BioMed Central Ltd;
<year>2011</year>
;
<volume>8</volume>
:
<fpage>90</fpage>
<pub-id pub-id-type="doi">10.1186/1742-4690-8-90</pub-id>
<pub-id pub-id-type="pmid">22067224</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref011">
<label>11</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wildschutte</surname>
<given-names>JH</given-names>
</name>
,
<name>
<surname>Williams</surname>
<given-names>ZH</given-names>
</name>
,
<name>
<surname>Montesion</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Subramanian</surname>
<given-names>RP</given-names>
</name>
,
<name>
<surname>Kidd</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Coffin</surname>
<given-names>JM</given-names>
</name>
.
<article-title>Discovery of unfixed endogenous retrovirus insertions in diverse human populations</article-title>
.
<source>Proc Natl Acad Sci</source>
.
<year>2016</year>
;
<fpage>201602336</fpage>
<pub-id pub-id-type="doi">10.1073/pnas.1602336113</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref012">
<label>12</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kurth</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Bannert</surname>
<given-names>N</given-names>
</name>
.
<article-title>Beneficial and detrimental effects of human endogenous retroviruses</article-title>
.
<source>Int J Cancer</source>
.
<year>2010</year>
;
<volume>126</volume>
:
<fpage>306</fpage>
<lpage>314</lpage>
.
<pub-id pub-id-type="doi">10.1002/ijc.24902</pub-id>
<pub-id pub-id-type="pmid">19795446</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref013">
<label>13</label>
<mixed-citation publication-type="journal">
<name>
<surname>Treangen</surname>
<given-names>TJ</given-names>
</name>
,
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
.
<article-title>Repetitive DNA and next-generation sequencing: computational challenges and solutions</article-title>
.
<source>Nat Rev Genet</source>
.
<year>2012</year>
;
<volume>13</volume>
:
<fpage>36</fpage>
<lpage>46</lpage>
.
<pub-id pub-id-type="doi">10.1038/nrg3117</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref014">
<label>14</label>
<mixed-citation publication-type="journal">
<name>
<surname>Belshaw</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Watson</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Katzourakis</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Howe</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Woolven-Allen</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Burt</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
<article-title>Rate of recombinational deletion among human endogenous retroviruses</article-title>
.
<source>J Virol</source>
.
<year>2007</year>
;
<volume>81</volume>
:
<fpage>9437</fpage>
<lpage>42</lpage>
.
<pub-id pub-id-type="doi">10.1128/JVI.02216-06</pub-id>
<pub-id pub-id-type="pmid">17581995</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref015">
<label>15</label>
<mixed-citation publication-type="journal">
<name>
<surname>Medstrand</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Mager</surname>
<given-names>DL</given-names>
</name>
.
<article-title>Human-specific integrations of the HERV-K endogenous retrovirus family</article-title>
.
<source>J Virol. Am Soc Microbiol</source>
;
<year>1998</year>
;
<volume>72</volume>
:
<fpage>9782</fpage>
<lpage>9787</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref016">
<label>16</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hughes</surname>
<given-names>JF</given-names>
</name>
,
<name>
<surname>Coffin</surname>
<given-names>JM</given-names>
</name>
.
<article-title>Human endogenous retrovirus K solo-LTR formation and insertional polymorphisms: implications for human and viral evolution</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
.
<year>2004</year>
;
<volume>101</volume>
:
<fpage>1668</fpage>
<lpage>72</lpage>
.
<pub-id pub-id-type="doi">10.1073/pnas.0307885100</pub-id>
<pub-id pub-id-type="pmid">14757818</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref017">
<label>17</label>
<mixed-citation publication-type="journal">
<name>
<surname>Belshaw</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Dawson</surname>
<given-names>ALA</given-names>
</name>
,
<name>
<surname>Woolven-Allen</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Redding</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Burt</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Tristem</surname>
<given-names>M</given-names>
</name>
.
<article-title>Genomewide screening reveals high levels of insertional polymorphism in the human endogenous retrovirus family HERV-K (HML2): implications for present-day activity</article-title>
.
<source>J Virol. Am Soc Microbiol</source>
;
<year>2005</year>
;
<volume>79</volume>
:
<fpage>12507</fpage>
<lpage>12514</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref018">
<label>18</label>
<mixed-citation publication-type="journal">
<name>
<surname>Marchi</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Kanapin</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Magiorkinis</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Belshaw</surname>
<given-names>R</given-names>
</name>
.
<article-title>Unfixed Endogenous Retroviral Insertions in the Human Population</article-title>
.
<source>J Virol</source>
.
<year>2014</year>
;
<volume>88</volume>
:
<fpage>9529</fpage>
<lpage>9537</lpage>
.
<pub-id pub-id-type="doi">10.1128/JVI.00919-14</pub-id>
<pub-id pub-id-type="pmid">24920817</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref019">
<label>19</label>
<mixed-citation publication-type="journal">
<name>
<surname>Shin</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Lee</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Son</surname>
<given-names>S-Y</given-names>
</name>
,
<name>
<surname>Ahn</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Kim H</surname>
<given-names>-S</given-names>
</name>
,
<name>
<surname>Han</surname>
<given-names>K</given-names>
</name>
.
<article-title>Human-specific HERV-K insertion causes genomic variations in the human genome</article-title>
.
<source>PLoS One</source>
. Public Library of Science;
<year>2013</year>
;
<volume>8</volume>
:
<fpage>e60605</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0060605</pub-id>
<pub-id pub-id-type="pmid">23593260</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref020">
<label>20</label>
<mixed-citation publication-type="journal">
<name>
<surname>Gröger</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Cynis</surname>
<given-names>H</given-names>
</name>
.
<article-title>Human Endogenous Retroviruses and Their Putative Role in the Development of Autoimmune Disorders Such as Multiple Sclerosis</article-title>
.
<source>Front Microbiol</source>
. Frontiers;
<year>2018</year>
;
<volume>9</volume>
:
<fpage>265</fpage>
<pub-id pub-id-type="doi">10.3389/fmicb.2018.00265</pub-id>
<pub-id pub-id-type="pmid">29515547</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref021">
<label>21</label>
<mixed-citation publication-type="journal">
<name>
<surname>Young</surname>
<given-names>GR</given-names>
</name>
,
<name>
<surname>Stoye</surname>
<given-names>JP</given-names>
</name>
,
<name>
<surname>Kassiotis</surname>
<given-names>G</given-names>
</name>
.
<article-title>Are human endogenous retroviruses pathogenic? An approach to testing the hypothesis</article-title>
.
<source>BioEssays</source>
.
<year>2013</year>
;
<volume>35</volume>
:
<fpage>794</fpage>
<lpage>803</lpage>
.
<pub-id pub-id-type="doi">10.1002/bies.201300049</pub-id>
<pub-id pub-id-type="pmid">23864388</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref022">
<label>22</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ryan</surname>
<given-names>FP</given-names>
</name>
.
<article-title>Human endogenous retroviruses in health and disease: a symbiotic perspective</article-title>
.
<source>J R Soc Med</source>
.
<year>2004</year>
;
<volume>97</volume>
:
<fpage>560</fpage>
<lpage>5</lpage>
.
<pub-id pub-id-type="doi">10.1258/jrsm.97.12.560</pub-id>
<pub-id pub-id-type="pmid">15574851</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref023">
<label>23</label>
<mixed-citation publication-type="journal">
<name>
<surname>Volkman</surname>
<given-names>HE</given-names>
</name>
,
<name>
<surname>Stetson</surname>
<given-names>DB</given-names>
</name>
.
<article-title>The enemy within: endogenous retroelements and autoimmune disease</article-title>
.
<source>Nat Immunol</source>
.
<year>2014</year>
;
<volume>15</volume>
:
<fpage>415</fpage>
<lpage>22</lpage>
.
<pub-id pub-id-type="doi">10.1038/ni.2872</pub-id>
<pub-id pub-id-type="pmid">24747712</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref024">
<label>24</label>
<mixed-citation publication-type="journal">
<name>
<surname>Magiorkinis</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Belshaw</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Katzourakis</surname>
<given-names>A</given-names>
</name>
. “
<article-title>There and back again”: revisiting the pathophysiological roles of human endogenous retroviruses in the post-genomic era</article-title>
.
<source>Philos Trans R Soc B Biol Sci</source>
.
<year>2013</year>
;
<volume>368</volume>
:
<fpage>20120504</fpage>
<lpage>20120504</lpage>
.
<pub-id pub-id-type="doi">10.1098/rstb.2012.0504</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref025">
<label>25</label>
<mixed-citation publication-type="journal">
<name>
<surname>Löwer</surname>
<given-names>R</given-names>
</name>
.
<article-title>The pathogenic potential of endogenous retroviruses: facts and fantasies</article-title>
.
<source>Trends Microbiol</source>
. Elsevier;
<year>1999</year>
;
<volume>7</volume>
:
<fpage>350</fpage>
<lpage>356</lpage>
.
<pub-id pub-id-type="pmid">10470042</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref026">
<label>26</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hohn</surname>
<given-names>O</given-names>
</name>
,
<name>
<surname>Hanke</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Bannert</surname>
<given-names>N</given-names>
</name>
.
<article-title>HERV-K (HML-2), the best preserved family of HERVs: endogenization, expression, and implications in health and disease</article-title>
.
<source>Front Oncol</source>
. Frontiers;
<year>2013</year>
;
<volume>3</volume>
:
<fpage>246</fpage>
<pub-id pub-id-type="doi">10.3389/fonc.2013.00246</pub-id>
<pub-id pub-id-type="pmid">24066280</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref027">
<label>27</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hughes</surname>
<given-names>JF</given-names>
</name>
,
<name>
<surname>Coffin</surname>
<given-names>JM</given-names>
</name>
.
<article-title>Human endogenous retroviral elements as indicators of ectopic recombination events in the primate genome</article-title>
.
<source>Genetics</source>
.
<year>2005</year>
;
<volume>171</volume>
:
<fpage>1183</fpage>
<lpage>94</lpage>
.
<pub-id pub-id-type="doi">10.1534/genetics.105.043976</pub-id>
<pub-id pub-id-type="pmid">16157677</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref028">
<label>28</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hughes</surname>
<given-names>JF</given-names>
</name>
,
<name>
<surname>Coffin</surname>
<given-names>JM</given-names>
</name>
.
<article-title>Evidence for genomic rearrangements mediated by human endogenous retroviruses during primate evolution</article-title>
.
<source>Nat Genet</source>
. Nature Publishing Group;
<year>2001</year>
;
<volume>29</volume>
:
<fpage>487</fpage>
<pub-id pub-id-type="doi">10.1038/ng775</pub-id>
<pub-id pub-id-type="pmid">11704760</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref029">
<label>29</label>
<mixed-citation publication-type="journal">
<name>
<surname>Romanish</surname>
<given-names>MT</given-names>
</name>
,
<name>
<surname>Cohen</surname>
<given-names>CJ</given-names>
</name>
,
<name>
<surname>Mager</surname>
<given-names>DL</given-names>
</name>
.
<article-title>Potential mechanisms of endogenous retroviral-mediated genomic instability in human cancer</article-title>
.
<source>Semin Cancer Biol</source>
.
<year>2010</year>
;
<volume>20</volume>
:
<fpage>246</fpage>
<lpage>253</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.semcancer.2010.05.005</pub-id>
<pub-id pub-id-type="pmid">20685251</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref030">
<label>30</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kamp</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Hirschmann</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Voss</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Huellen</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Vogt</surname>
<given-names>PH</given-names>
</name>
.
<article-title>Two long homologous retroviral sequence blocks in proximal Yq11 cause AZFa microdeletions as a result of intrachromosomal recombination events</article-title>
.
<source>Hum Mol Genet</source>
.
<year>2000</year>
;
<volume>9</volume>
:
<fpage>2563</fpage>
<lpage>72</lpage>
.
<pub-id pub-id-type="pmid">11030762</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref031">
<label>31</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kidd</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Graves</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Newman</surname>
<given-names>TL</given-names>
</name>
,
<name>
<surname>Fulton</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Hayden</surname>
<given-names>HS</given-names>
</name>
,
<name>
<surname>Malig</surname>
<given-names>M</given-names>
</name>
,
<etal>et al</etal>
<article-title>A human genome structural variation sequencing resource reveals insights into mutational mechanisms</article-title>
.
<source>Cell</source>
. Elsevier;
<year>2010</year>
;
<volume>143</volume>
:
<fpage>837</fpage>
<lpage>847</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.cell.2010.10.027</pub-id>
<pub-id pub-id-type="pmid">21111241</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref032">
<label>32</label>
<mixed-citation publication-type="journal">
<name>
<surname>Cohen</surname>
<given-names>CJ</given-names>
</name>
,
<name>
<surname>Lock</surname>
<given-names>WM</given-names>
</name>
,
<name>
<surname>Mager</surname>
<given-names>DL</given-names>
</name>
.
<article-title>Endogenous retroviral LTRs as promoters for human genes: a critical assessment</article-title>
.
<source>Gene</source>
. Elsevier B.V.;
<year>2009</year>
;
<volume>448</volume>
:
<fpage>105</fpage>
<lpage>14</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.gene.2009.06.020</pub-id>
<pub-id pub-id-type="pmid">19577618</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref033">
<label>33</label>
<mixed-citation publication-type="journal">
<name>
<surname>Simmons</surname>
<given-names>W</given-names>
</name>
.
<article-title>The Role of Human Endogenous Retroviruses (HERV-K) in the Pathogenesis of Human Cancers</article-title>
.
<source>Mol Biol</source>
.
<year>2016</year>
;
<volume>05</volume>
<pub-id pub-id-type="doi">10.4172/2168-9547.1000169</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref034">
<label>34</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wildschutte</surname>
<given-names>JH</given-names>
</name>
,
<name>
<surname>Ram</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Subramanian</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Stevens</surname>
<given-names>VL</given-names>
</name>
,
<name>
<surname>Coffin</surname>
<given-names>JM</given-names>
</name>
.
<article-title>The distribution of insertionally polymorphic endogenous retroviruses in breast cancer patients and cancer-free controls</article-title>
.
<source>Retrovirology</source>
.
<year>2014</year>
;
<volume>11</volume>
:
<fpage>62</fpage>
<pub-id pub-id-type="doi">10.1186/s12977-014-0062-3</pub-id>
<pub-id pub-id-type="pmid">25112280</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref035">
<label>35</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kassiotis</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Stoye</surname>
<given-names>JP</given-names>
</name>
.
<article-title>Making a virtue of necessity: the pleiotropic role of human endogenous retroviruses in cancer</article-title>
.
<source>Philos Trans R Soc B Biol Sci</source>
.
<year>2017</year>
;
<volume>372</volume>
:
<fpage>20160277</fpage>
<pub-id pub-id-type="doi">10.1098/rstb.2016.0277</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref036">
<label>36</label>
<mixed-citation publication-type="journal">
<name>
<surname>Johanning</surname>
<given-names>GL</given-names>
</name>
,
<name>
<surname>Malouf</surname>
<given-names>GG</given-names>
</name>
,
<name>
<surname>Zheng</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Esteva</surname>
<given-names>FJ</given-names>
</name>
,
<name>
<surname>Weinstein</surname>
<given-names>JN</given-names>
</name>
,
<name>
<surname>Wang-Johanning</surname>
<given-names>F</given-names>
</name>
,
<etal>et al</etal>
<article-title>Expression of human endogenous retrovirus-K is strongly associated with the basal-like breast cancer phenotype</article-title>
.
<source>Sci Rep</source>
.
<year>2017</year>
;
<volume>7</volume>
:
<fpage>41960</fpage>
<pub-id pub-id-type="doi">10.1038/srep41960</pub-id>
<pub-id pub-id-type="pmid">28165048</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref037">
<label>37</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bhardwaj</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Coffin</surname>
<given-names>JM</given-names>
</name>
.
<article-title>Endogenous retroviruses and human cancer: Is there anything to the rumors?</article-title>
<source>Cell Host Microbe</source>
. Elsevier Inc.;
<year>2014</year>
;
<volume>15</volume>
:
<fpage>255</fpage>
<lpage>259</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.chom.2014.02.013</pub-id>
<pub-id pub-id-type="pmid">24629332</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref038">
<label>38</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hanke</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Hohn</surname>
<given-names>O</given-names>
</name>
,
<name>
<surname>Bannert</surname>
<given-names>N</given-names>
</name>
.
<article-title>HERV-K(HML-2), a seemingly silent subtenant—but still waters run deep</article-title>
.
<source>Apmis</source>
.
<year>2016</year>
;
<volume>124</volume>
<pub-id pub-id-type="doi">10.1111/apm.12475</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref039">
<label>39</label>
<mixed-citation publication-type="journal">
<name>
<surname>Trela</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Nelson</surname>
<given-names>PN</given-names>
</name>
,
<name>
<surname>Rylance</surname>
<given-names>PB</given-names>
</name>
.
<article-title>The role of molecular mimicry and other factors in the association of Human Endogenous Retroviruses and autoimmunity</article-title>
.
<source>APMIS</source>
.
<year>2016</year>
;
<volume>124</volume>
:
<fpage>88</fpage>
<lpage>104</lpage>
.
<pub-id pub-id-type="doi">10.1111/apm.12487</pub-id>
<pub-id pub-id-type="pmid">26818264</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref040">
<label>40</label>
<mixed-citation publication-type="journal">
<name>
<surname>Antony</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Deslauriers</surname>
<given-names>AM</given-names>
</name>
,
<name>
<surname>Bhat</surname>
<given-names>RK</given-names>
</name>
,
<name>
<surname>Ellestad</surname>
<given-names>KK</given-names>
</name>
,
<name>
<surname>Power</surname>
<given-names>C</given-names>
</name>
.
<article-title>Human endogenous retroviruses and multiple sclerosis: innocent bystanders or disease determinants?</article-title>
<source>Biochim Biophys Acta</source>
.
<year>2011</year>
;
<volume>1812</volume>
:
<fpage>162</fpage>
<lpage>76</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.bbadis.2010.07.016</pub-id>
<pub-id pub-id-type="pmid">20696240</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref041">
<label>41</label>
<mixed-citation publication-type="journal">
<name>
<surname>Tugnet</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Rylance</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Roden</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Trela</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Nelson</surname>
<given-names>P</given-names>
</name>
.
<article-title>Human endogenous retroviruses (HERVs) and autoimmune rheumatic disease: is there a link?</article-title>
<source>Open Rheumatol J</source>
.
<year>2013</year>
;
<volume>7</volume>
:
<fpage>13</fpage>
<pub-id pub-id-type="doi">10.2174/1874312901307010013</pub-id>
<pub-id pub-id-type="pmid">23750183</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref042">
<label>42</label>
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Lee</surname>
<given-names>M-H</given-names>
</name>
,
<name>
<surname>Henderson</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Tyagi</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Bachani</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Steiner</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
<article-title>Human endogenous retrovirus-K contributes to motor neuron disease</article-title>
.
<source>Sci Transl Med</source>
.
<year>2015</year>
;
<volume>7</volume>
:
<fpage>307ra153</fpage>
<pub-id pub-id-type="doi">10.1126/scitranslmed.aac8201</pub-id>
<pub-id pub-id-type="pmid">26424568</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref043">
<label>43</label>
<mixed-citation publication-type="journal">
<name>
<surname>Douville</surname>
<given-names>RN</given-names>
</name>
,
<name>
<surname>Nath</surname>
<given-names>A</given-names>
</name>
.
<article-title>Human Endogenous Retrovirus-K and TDP-43 Expression Bridges ALS and HIV Neuropathology</article-title>
.
<source>Front Microbiol.</source>
Frontiers;
<year>2017</year>
;
<volume>8</volume>
:
<fpage>1986</fpage>
<pub-id pub-id-type="doi">10.3389/fmicb.2017.01986</pub-id>
<pub-id pub-id-type="pmid">29075249</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref044">
<label>44</label>
<mixed-citation publication-type="journal">
<name>
<surname>Trombetta</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Fantini</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>D’Atanasio</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Sellitto</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Cruciani</surname>
<given-names>F</given-names>
</name>
.
<article-title>Evidence of extensive non-allelic gene conversion among LTR elements in the human genome</article-title>
.
<source>Sci Rep</source>
.
<year>2016</year>
;
<volume>6</volume>
:
<fpage>28710</fpage>
<pub-id pub-id-type="doi">10.1038/srep28710</pub-id>
<pub-id pub-id-type="pmid">27346230</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref045">
<label>45</label>
<mixed-citation publication-type="journal">
<name>
<surname>Nexø</surname>
<given-names>BA</given-names>
</name>
,
<name>
<surname>Villesen</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Nissen</surname>
<given-names>KK</given-names>
</name>
,
<name>
<surname>Lindegaard</surname>
<given-names>HM</given-names>
</name>
,
<name>
<surname>Rossing</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Petersen</surname>
<given-names>T</given-names>
</name>
,
<etal>et al</etal>
<article-title>Are human endogenous retroviruses triggers of autoimmune diseases? Unveiling associations of three diseases and viral loci</article-title>
.
<source>Immunol Res</source>
.
<year>2016</year>
;
<volume>64</volume>
:
<fpage>55</fpage>
<lpage>63</lpage>
.
<pub-id pub-id-type="doi">10.1007/s12026-015-8671-z</pub-id>
<pub-id pub-id-type="pmid">26091722</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref046">
<label>46</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bhardwaj</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Montesion</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Roy</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Coffin</surname>
<given-names>JM</given-names>
</name>
.
<article-title>Differential expression of HERV-K (HML-2) proviruses in cells and virions of the teratocarcinoma cell line Tera-1</article-title>
.
<source>Viruses</source>
.
<year>2015</year>
;
<volume>7</volume>
:
<fpage>939</fpage>
<lpage>68</lpage>
.
<pub-id pub-id-type="doi">10.3390/v7030939</pub-id>
<pub-id pub-id-type="pmid">25746218</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref047">
<label>47</label>
<mixed-citation publication-type="book">
<name>
<surname>Fukunaga</surname>
<given-names>K</given-names>
</name>
.
<source>Introduction to statistical pattern recognition</source>
.
<publisher-name>Academic press</publisher-name>
;
<year>2013</year>
.</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref048">
<label>48</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ciuffi</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Ronen</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Brady</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Malani</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Berry</surname>
<given-names>CC</given-names>
</name>
,
<etal>et al</etal>
<article-title>Methods for integration site distribution analyses in animal cell genomes</article-title>
.
<source>Methods</source>
.
<year>2009</year>
;
<volume>47</volume>
:
<fpage>261</fpage>
<lpage>268</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.ymeth.2008.10.028</pub-id>
<pub-id pub-id-type="pmid">19038346</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref049">
<label>49</label>
<mixed-citation publication-type="journal">
<name>
<surname>Witherspoon</surname>
<given-names>DJ</given-names>
</name>
,
<name>
<surname>Xing</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Watkins</surname>
<given-names>WS</given-names>
</name>
,
<name>
<surname>Batzer</surname>
<given-names>MA</given-names>
</name>
,
<name>
<surname>Jorde</surname>
<given-names>LB</given-names>
</name>
.
<article-title>Mobile element scanning (ME-Scan) by targeted high-throughput sequencing</article-title>
.
<source>BMC Genomics</source>
.
<year>2010</year>
;
<volume>11</volume>
:
<fpage>410</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-11-410</pub-id>
<pub-id pub-id-type="pmid">20591181</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref050">
<label>50</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sudmant</surname>
<given-names>PH</given-names>
</name>
,
<name>
<surname>Rausch</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Gardner</surname>
<given-names>EJ</given-names>
</name>
,
<name>
<surname>Handsaker</surname>
<given-names>RE</given-names>
</name>
,
<name>
<surname>Abyzov</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Huddleston</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
<article-title>An integrated map of structural variation in 2,504 human genomes</article-title>
.
<source>Nature</source>
.
<year>2015</year>
;
<volume>526</volume>
:
<fpage>75</fpage>
<pub-id pub-id-type="doi">10.1038/nature15394</pub-id>
<pub-id pub-id-type="pmid">26432246</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref051">
<label>51</label>
<mixed-citation publication-type="journal">
<collab>Consortium 1000 Genomes Project</collab>
, others.
<article-title>A global reference for human genetic variation</article-title>
.
<source>Nature</source>
. Nature Publishing Group;
<year>2015</year>
;
<volume>526</volume>
:
<fpage>68</fpage>
<pub-id pub-id-type="doi">10.1038/nature15393</pub-id>
<pub-id pub-id-type="pmid">26432245</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref052">
<label>52</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lin</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Chan</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>West</surname>
<given-names>M</given-names>
</name>
.
<article-title>Discriminative variable subsets in bayesian classification with mixture models, with application in flow cytometry studies</article-title>
.
<source>Biostatistics</source>
.
<year>2015</year>
;
<volume>17</volume>
:
<fpage>40</fpage>
<lpage>53</lpage>
.
<pub-id pub-id-type="doi">10.1093/biostatistics/kxv021</pub-id>
<pub-id pub-id-type="pmid">26040910</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref053">
<label>53</label>
<mixed-citation publication-type="journal">
<name>
<surname>Escobar</surname>
<given-names>MD</given-names>
</name>
,
<name>
<surname>West</surname>
<given-names>M</given-names>
</name>
.
<article-title>Bayesian density estimation and inference using mixtures</article-title>
.
<source>J Am Stat Assoc</source>
. Taylor & Francis;
<year>1995</year>
;
<volume>90</volume>
:
<fpage>577</fpage>
<lpage>588</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref054">
<label>54</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ishwaran</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>James</surname>
<given-names>LF</given-names>
</name>
.
<article-title>Gibbs sampling methods for stick-breaking priors</article-title>
.
<source>J Am Stat Assoc</source>
.
<year>2001</year>
;
<volume>96</volume>
:
<fpage>161</fpage>
<lpage>173</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref055">
<label>55</label>
<mixed-citation publication-type="journal">
<name>
<surname>Huang</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Chen</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Chen</surname>
<given-names>G</given-names>
</name>
.
<article-title>A fast algorithm for mining association rules</article-title>
.
<source>J Comput Sci Technol</source>
.
<year>2000</year>
;
<volume>15</volume>
:
<fpage>619</fpage>
<lpage>624</lpage>
.
<pub-id pub-id-type="doi">10.1007/BF02948845</pub-id>
</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref056">
<label>56</label>
<mixed-citation publication-type="journal">
<name>
<surname>Benjamini</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Hochberg</surname>
<given-names>Y</given-names>
</name>
.
<article-title>Controlling the false discovery rate: a practical and powerful approach to multiple testing</article-title>
.
<source>J R Stat Soc Ser B</source>
.
<year>1995</year>
;
<fpage>289</fpage>
<lpage>300</lpage>
.</mixed-citation>
</ref>
<ref id="pcbi.1006564.ref057">
<label>57</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bostock</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Ogievetsky</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Heer</surname>
<given-names>J</given-names>
</name>
.
<article-title>D
<sup>3</sup>
Data-Driven Documents</article-title>
.
<source>IEEE Trans Vis Comput Graph</source>
.
<year>2011</year>
;
<volume>17</volume>
:
<fpage>2301</fpage>
<lpage>2309</lpage>
.
<pub-id pub-id-type="doi">10.1109/TVCG.2011.185</pub-id>
<pub-id pub-id-type="pmid">22034350</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F89  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000F89  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021