MersV1, Pmc, Corpus, bibRecord, 001191

***** Acces problem to record *****\

Identifieur interne : 001191 ( Pmc/Corpus ); précédent : 0011909; suivant : 0011920 ***** probable Xml problem with record *****

Links to Exploration step

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">DisCVR: Rapid viral diagnosis from high-throughput sequencing data</title>
<author><name sortKey="Maabar, Maha" sort="Maabar, Maha" uniqKey="Maabar M" first="Maha" last="Maabar">Maha Maabar</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Davison, Andrew J" sort="Davison, Andrew J" uniqKey="Davison A" first="Andrew J" last="Davison">Andrew J. Davison</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Vu Ak, Matej" sort="Vu Ak, Matej" uniqKey="Vu Ak M" first="Matej" last="Vu Ak">Matej Vu Ak</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Thorburn, Fiona" sort="Thorburn, Fiona" uniqKey="Thorburn F" first="Fiona" last="Thorburn">Fiona Thorburn</name>
<affiliation><nlm:aff id="vez033-aff2">Microbiology Department, Glasgow Royal Infirmary, Glasgow G4 0SF, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Murcia, Pablo R" sort="Murcia, Pablo R" uniqKey="Murcia P" first="Pablo R" last="Murcia">Pablo R. Murcia</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Gunson, Rory" sort="Gunson, Rory" uniqKey="Gunson R" first="Rory" last="Gunson">Rory Gunson</name>
<affiliation><nlm:aff id="vez033-aff3">West of Scotland Specialist Virology Centre, Glasgow Royal Infirmary, Glasgow G4 0SF, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Palmarini, Massimo" sort="Palmarini, Massimo" uniqKey="Palmarini M" first="Massimo" last="Palmarini">Massimo Palmarini</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Hughes, Joseph" sort="Hughes, Joseph" uniqKey="Hughes J" first="Joseph" last="Hughes">Joseph Hughes</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">31528358</idno>
<idno type="pmc">6735924</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6735924</idno>
<idno type="RBID">PMC:6735924</idno>
<idno type="doi">10.1093/ve/vez033</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">001191</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001191</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">DisCVR: Rapid viral diagnosis from high-throughput sequencing data</title>
<author><name sortKey="Maabar, Maha" sort="Maabar, Maha" uniqKey="Maabar M" first="Maha" last="Maabar">Maha Maabar</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Davison, Andrew J" sort="Davison, Andrew J" uniqKey="Davison A" first="Andrew J" last="Davison">Andrew J. Davison</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Vu Ak, Matej" sort="Vu Ak, Matej" uniqKey="Vu Ak M" first="Matej" last="Vu Ak">Matej Vu Ak</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Thorburn, Fiona" sort="Thorburn, Fiona" uniqKey="Thorburn F" first="Fiona" last="Thorburn">Fiona Thorburn</name>
<affiliation><nlm:aff id="vez033-aff2">Microbiology Department, Glasgow Royal Infirmary, Glasgow G4 0SF, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Murcia, Pablo R" sort="Murcia, Pablo R" uniqKey="Murcia P" first="Pablo R" last="Murcia">Pablo R. Murcia</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Gunson, Rory" sort="Gunson, Rory" uniqKey="Gunson R" first="Rory" last="Gunson">Rory Gunson</name>
<affiliation><nlm:aff id="vez033-aff3">West of Scotland Specialist Virology Centre, Glasgow Royal Infirmary, Glasgow G4 0SF, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Palmarini, Massimo" sort="Palmarini, Massimo" uniqKey="Palmarini M" first="Massimo" last="Palmarini">Massimo Palmarini</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Hughes, Joseph" sort="Hughes, Joseph" uniqKey="Hughes J" first="Joseph" last="Hughes">Joseph Hughes</name>
<affiliation><nlm:aff id="vez033-aff1">MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">Virus Evolution</title>
<idno type="eISSN">2057-1577</idno>
<imprint><date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><title>Abstract</title>
<p>High-throughput sequencing (HTS) enables most pathogens in a clinical sample to be detected from a single analysis, thereby providing novel opportunities for diagnosis, surveillance, and epidemiology. However, this powerful technology is difficult to apply in diagnostic laboratories because of its computational and bioinformatic demands. We have developed DisCVR, which detects known human viruses in clinical samples by matching sample <italic>k</italic>
-mers (twenty-two nucleotide sequences) to <italic>k</italic>
-mers from taxonomically labeled viral genomes. DisCVR was validated using published HTS data for eighty-nine clinical samples from adults with upper respiratory tract infections. These samples had been tested for viruses metagenomically and also by real-time polymerase chain reaction assay, which is the standard diagnostic method. DisCVR detected human viruses with high sensitivity (79%) and specificity (100%), and was able to detect mixed infections. Moreover, it produced results comparable to those in a published metagenomic analysis of 177 blood samples from patients in Nigeria. DisCVR has been designed as a user-friendly tool for detecting human viruses from HTS data using computers with limited RAM and processing power, and includes a graphical user interface to help users interpret and validate the output. It is written in Java and is publicly available from <ext-link ext-link-type="uri" xlink:href="http://bioinformatics.cvr.ac.uk/discvr.php">http://bioinformatics.cvr.ac.uk/discvr.php</ext-link>
.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Altschul, S F" uniqKey="Altschul S">S. F. Altschul</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Audano, P" uniqKey="Audano P">P. Audano</name>
</author>
<author><name sortKey="Vannberg, F" uniqKey="Vannberg F">F. Vannberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Borozan, I" uniqKey="Borozan I">I. Borozan</name>
</author>
<author><name sortKey="Ferretti, V" uniqKey="Ferretti V">V. Ferretti</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Borozan, I" uniqKey="Borozan I">I. Borozan</name>
</author>
<author><name sortKey="Watt, S" uniqKey="Watt S">S. Watt</name>
</author>
<author><name sortKey="Ferretti, V" uniqKey="Ferretti V">V. Ferretti</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Breitwieser, F P" uniqKey="Breitwieser F">F. P. Breitwieser</name>
</author>
<author><name sortKey="Salzberg, S L" uniqKey="Salzberg S">S. L. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Brister, J R" uniqKey="Brister J">J. R. Brister</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Flygare, S" uniqKey="Flygare S">S. Flygare</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Kawulok, J" uniqKey="Kawulok J">J. Kawulok</name>
</author>
<author><name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S. Deorowicz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Koslicki, D" uniqKey="Koslicki D">D. Koslicki</name>
</author>
<author><name sortKey="Falush, D" uniqKey="Falush D">D. Falush</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, Y" uniqKey="Li Y">Y. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Maarala, A I" uniqKey="Maarala A">A. I. Maarala</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Manekar, S C" uniqKey="Manekar S">S. C. Manekar</name>
</author>
<author><name sortKey="Sathe, S R" uniqKey="Sathe S">S. R. Sathe</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Marcais, G" uniqKey="Marcais G">G. Marçais</name>
</author>
<author><name sortKey="Kingsford, C" uniqKey="Kingsford C">C. Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Orton, R J" uniqKey="Orton R">R. J. Orton</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ounit, R" uniqKey="Ounit R">R. Ounit</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ren, J" uniqKey="Ren J">J. Ren</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rosen, G L" uniqKey="Rosen G">G. L. Rosen</name>
</author>
<author><name sortKey="Reichenberger, E R" uniqKey="Reichenberger E">E. R. Reichenberger</name>
</author>
<author><name sortKey="Rosenfeld, A M" uniqKey="Rosenfeld A">A. M. Rosenfeld</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Scheuch, M" uniqKey="Scheuch M">M. Scheuch</name>
</author>
<author><name sortKey="Hoper, D" uniqKey="Hoper D">D. Höper</name>
</author>
<author><name sortKey="Beer, M" uniqKey="Beer M">M. Beer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Shannon, C E" uniqKey="Shannon C">C. E. Shannon</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sims, G E" uniqKey="Sims G">G. E. Sims</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sreenu, V B" uniqKey="Sreenu V">V. B. Sreenu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Stremlau, M H" uniqKey="Stremlau M">M. H. Stremlau</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Thorburn, F" uniqKey="Thorburn F">F. Thorburn</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Visser, M" uniqKey="Visser M">M. Visser</name>
</author>
<author><name sortKey="Burger, J T" uniqKey="Burger J">J. T. Burger</name>
</author>
<author><name sortKey="Maree, H J" uniqKey="Maree H">H. J. Maree</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wang, Q" uniqKey="Wang Q">Q. Wang</name>
</author>
<author><name sortKey="Jia, P" uniqKey="Jia P">P. Jia</name>
</author>
<author><name sortKey="Zhao, Z" uniqKey="Zhao Z">Z. Zhao</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wood, D E" uniqKey="Wood D">D. E. Wood</name>
</author>
<author><name sortKey="Salzberg, S L" uniqKey="Salzberg S">S. L. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wu, G A" uniqKey="Wu G">G. A. Wu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Youden, W J" uniqKey="Youden W">W. J. Youden</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, Q" uniqKey="Zhang Q">Q. Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zheng, Y" uniqKey="Zheng Y">Y. Zheng</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">Virus Evol</journal-id>
<journal-id journal-id-type="iso-abbrev">Virus Evol</journal-id>
<journal-id journal-id-type="publisher-id">vevolu</journal-id>
<journal-title-group><journal-title>Virus Evolution</journal-title>
</journal-title-group>
<issn pub-type="epub">2057-1577</issn>
<publisher><publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">31528358</article-id>
<article-id pub-id-type="pmc">6735924</article-id>
<article-id pub-id-type="doi">10.1093/ve/vez033</article-id>
<article-id pub-id-type="publisher-id">vez033</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Resources</subject>
</subj-group>
</article-categories>
<title-group><article-title>DisCVR: Rapid viral diagnosis from high-throughput sequencing data</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Maabar</surname>
<given-names>Maha</given-names>
</name>
<xref ref-type="aff" rid="vez033-aff1">1</xref>
<xref ref-type="corresp" rid="vez033-cor1"></xref>
<pmc-comment>Maha.Maabar@glasgow.ac.uk</pmc-comment>
        </contrib>
<contrib contrib-type="author"><name><surname>Davison</surname>
<given-names>Andrew J</given-names>
</name>
<xref ref-type="aff" rid="vez033-aff1">1</xref>
</contrib>
<contrib contrib-type="author"><contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0002-3181-2808</contrib-id>
<name><surname>Vučak</surname>
<given-names>Matej</given-names>
</name>
<xref ref-type="aff" rid="vez033-aff1">1</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Thorburn</surname>
<given-names>Fiona</given-names>
</name>
<xref ref-type="aff" rid="vez033-aff2">2</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Murcia</surname>
<given-names>Pablo R</given-names>
</name>
<xref ref-type="aff" rid="vez033-aff1">1</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Gunson</surname>
<given-names>Rory</given-names>
</name>
<xref ref-type="aff" rid="vez033-aff3">3</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Palmarini</surname>
<given-names>Massimo</given-names>
</name>
<xref ref-type="aff" rid="vez033-aff1">1</xref>
</contrib>
<contrib contrib-type="author"><contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0003-2556-2563</contrib-id>
<name><surname>Hughes</surname>
<given-names>Joseph</given-names>
</name>
<xref ref-type="aff" rid="vez033-aff1">1</xref>
<xref ref-type="corresp" rid="vez033-cor1"></xref>
<pmc-comment>Joseph.Hughes@glasgow.ac.uk</pmc-comment>
        </contrib>
</contrib-group>
<aff id="vez033-aff1"><label>1</label>
MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, 464 Bearsden Road, Glasgow G61 1QH, UK</aff>
<aff id="vez033-aff2"><label>2</label>
Microbiology Department, Glasgow Royal Infirmary, Glasgow G4 0SF, UK</aff>
<aff id="vez033-aff3"><label>3</label>
West of Scotland Specialist Virology Centre, Glasgow Royal Infirmary, Glasgow G4 0SF, UK</aff>
<author-notes><corresp id="vez033-cor1">Corresponding authors: E-mails: <email>Joseph.Hughes@glasgow.ac.uk</email>
 (J.H.); <email>Maha.Maabar@glasgow.ac.uk</email>
 (M.M.)</corresp>
</author-notes>
<pub-date pub-type="collection"><month>7</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="epub" iso-8601-date="2019-08-26"><day>26</day>
<month>8</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="pmc-release"><day>26</day>
<month>8</month>
<year>2019</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
      <volume>5</volume>
<issue>2</issue>
<elocation-id>vez033</elocation-id>
<permissions><copyright-statement>© The Author(s) 2019. Published by Oxford University Press.</copyright-statement>
<copyright-year>2019</copyright-year>
<license license-type="cc-by" xlink:href="http://creativecommons.org/licenses/by/4.0/"><license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="vez033.pdf"></self-uri>
<abstract><title>Abstract</title>
<p>High-throughput sequencing (HTS) enables most pathogens in a clinical sample to be detected from a single analysis, thereby providing novel opportunities for diagnosis, surveillance, and epidemiology. However, this powerful technology is difficult to apply in diagnostic laboratories because of its computational and bioinformatic demands. We have developed DisCVR, which detects known human viruses in clinical samples by matching sample <italic>k</italic>
-mers (twenty-two nucleotide sequences) to <italic>k</italic>
-mers from taxonomically labeled viral genomes. DisCVR was validated using published HTS data for eighty-nine clinical samples from adults with upper respiratory tract infections. These samples had been tested for viruses metagenomically and also by real-time polymerase chain reaction assay, which is the standard diagnostic method. DisCVR detected human viruses with high sensitivity (79%) and specificity (100%), and was able to detect mixed infections. Moreover, it produced results comparable to those in a published metagenomic analysis of 177 blood samples from patients in Nigeria. DisCVR has been designed as a user-friendly tool for detecting human viruses from HTS data using computers with limited RAM and processing power, and includes a graphical user interface to help users interpret and validate the output. It is written in Java and is publicly available from <ext-link ext-link-type="uri" xlink:href="http://bioinformatics.cvr.ac.uk/discvr.php">http://bioinformatics.cvr.ac.uk/discvr.php</ext-link>
.</p>
</abstract>
<kwd-group><kwd>virus</kwd>
<kwd>diagnosis</kwd>
<kwd>high-throughput sequencing</kwd>
<kwd>k-mer</kwd>
</kwd-group>
<funding-group><award-group award-type="grant"><funding-source><named-content content-type="funder-name">Medical Research Council</named-content>
<named-content content-type="funder-identifier">10.13039/501100000265</named-content>
</funding-source>
<award-id>MC_UU_12014/12</award-id>
</award-group>
</funding-group>
<counts><page-count count="8"></page-count>
</counts>
</article-meta>
</front>
<body><sec><title>1. Introduction</title>
<p>The standard method for rapidly detecting known human viruses in clinical samples is the polymerase chain reaction (PCR), in which short oligonucleotides are used to amplify and probe specific regions of viral genomes. The limitations of this technique include the targeting of a relatively small number of viruses per assay and a dependence on sequence conservation among viral strains. High-throughput sequencing (HTS) provides approaches to viral diagnosis that have much greater scope. Thus, metagenomic analysis of HTS data can provide extensive viral genotyping information, as well as the characterization of complex multiple infections (<xref rid="vez033-B25" ref-type="bibr">Thorburn et al. 2015</xref>
). Several metagenomic pipelines using <italic>de novo</italic>
 assembly and homology matching have been developed for virus detection (<xref rid="vez033-B27" ref-type="bibr">Wang, Jia, and Zhao 2013</xref>
; <xref rid="vez033-B20" ref-type="bibr">Scheuch, Höper, and Beer 2015</xref>
; <xref rid="vez033-B12" ref-type="bibr">Li et al. 2016</xref>
; <xref rid="vez033-B18" ref-type="bibr">Ren et al. 2017</xref>
; <xref rid="vez033-B32" ref-type="bibr">Zheng et al. 2017</xref>
; <xref rid="vez033-B13" ref-type="bibr">Maarala et al. 2018</xref>
). However, analyzing HTS data using such approaches brings heavy computing and bioinformatic demands that are difficult to meet and standardize in diagnostic laboratories (<xref rid="vez033-B8" ref-type="bibr">Flygare et al. 2016</xref>
). As a consequence, we have developed DisCVR, which is a fast, accurate and easy-to-use tool for detecting known human viruses in clinical samples.</p>
<p>DisCVR employs an abundance-based method, which is a metagenomic approach for rapidly profiling the organisms present in a sample. It works by creating a database of short nucleotide sequences (<italic>k</italic>
-mers) from a large set of viral reference sequences, tagging the <italic>k</italic>
-mers taxonomically according to the viruses from which they came, screening each read in the HTS dataset for the presence of virus <italic>k</italic>
-mers, and organizing a summary of the viruses present in the sample via the tags. This approach makes data analysis very efficient, thereby minimizing the computing effort required (<xref rid="vez033-B16" ref-type="bibr">Orton et al. 2016</xref>
).</p>
<p>Several existing tools utilize the abundance-based method to classify the reads in an HTS dataset. Naive Bayes Classification (<xref rid="vez033-B19" ref-type="bibr">Rosen, Reichenberger, and Rosenfeld 2011</xref>
) employs a naïve Bayesian classifier to assign a log-likelihood score to each read. This classifier is trained by using a set of unique profiles of fifteen nucleotide <italic>k</italic>
-mers from microbial genomes, and then allows users to upload the dataset to a web site and obtain a summary of results listing the best taxonomic match for each read. Kraken (<xref rid="vez033-B28" ref-type="bibr">Wood and Salzberg 2014</xref>
) assigns each <italic>k</italic>
-mer in the database to the last common ancestor of species having that <italic>k</italic>
-mer, and then assigns each read to the taxon with the most matching <italic>k</italic>
-mers. CoMeta (<xref rid="vez033-B10" ref-type="bibr">Kawulok and Deorowicz 2015</xref>
) creates a database of all <italic>k</italic>
-mers for each rank in the taxonomic tree, and then uses these databases to classify the reads at each rank. CLARK (<xref rid="vez033-B17" ref-type="bibr">Ounit et al. 2015</xref>
) collects target-specific <italic>k</italic>
-mer sets from reference genomes belonging to a certain taxonomic rank (e.g. genus), and then classifies reads at that rank. This approach reduces the database size but requires a different database to be built for each rank. To improve the accuracy of the classification, CSSSCL (<xref rid="vez033-B3" ref-type="bibr">Borozan and Ferretti 2016</xref>
) creates a BLAST database, a <italic>k</italic>
-mer database and a compression database from a collection of reference genomes. Sequences in the sample are classified according to a combined sequence similarity score (CSSS) (<xref rid="vez033-B4" ref-type="bibr">Borozan, Watt, and Ferretti 2015</xref>
) calculated from information in the pre-computed databases. In contrast to Kraken, CLARK, and CoMeta, all of which assign individual reads, MetaPalette (<xref rid="vez033-B11" ref-type="bibr">Koslicki and Falush 2016</xref>
) profiles the entire dataset and returns the relative proportions of organisms present by using <italic>k</italic>
-mer sizes of 30 and 50, based on the rationale that using two different <italic>k</italic>
-mer sizes allows strain-level variation to be captured more accurately. Taxonomer (<xref rid="vez033-B8" ref-type="bibr">Flygare et al. 2016</xref>
) compares each read to multiple reference databases, assigning it to a high-level taxonomic category on the basis of the <italic>k</italic>
-mer content of the read, and then uses exact <italic>k</italic>
-mer matching to assign each read to a reference by maximizing the total <italic>k</italic>
-mer weight. This weight, which is a function of the <italic>k</italic>
-mer count in the reference and the database, provides a database-specific measure of how likely it is that a <italic>k</italic>
-mer originated from a particular reference sequence.</p>
<p>Despite the growing number and popularity of <italic>k</italic>
-mer-based classification tools, these tools have limitations. The databases are built using a limited set of reference sequences and therefore are of restricted utility for classifying organisms with sequences that diverge from the reference. This limitation can be a particular problem when significant variation exists in an organism at strain level. It can be addressed by incorporating a range of variants into the database, but this creates a much larger database that may make the analysis challenging to run on resource-limited computers. Furthermore, many of the current tools are run on Linux systems and hence require the operator to have expertise in command line usage and an understanding of bioinformatics, which may be difficult to find in diagnostic settings. To our knowledge, the only tool that has been developed for ease of use and for application on computers with limited resources is Truffle (<xref rid="vez033-B26" ref-type="bibr">Visser, Burger, and Maree 2016</xref>
). This is designed to screen for a limited set of user-specified viruses, comes preloaded with probe-sets for grapevine viruses, and cannot easily be updated for large sets of viruses from other hosts.</p>
<p>Here, we present DisCVR, a <italic>k</italic>
-mer-based classification tool for detecting known human viruses from HTS data derived from clinical samples. DisCVR can be installed on a desktop computer to allow diagnostic laboratories to analyze large, confidential datasets by using a simple, straightforward graphical user interface (GUI) without specialized bioinformatics expertise. It is optimized to run on Windows, Linux and Mac OS, using minimal RAM and processing power without compromising speed and accuracy. The tool currently integrates curated viral databases at the taxonomic levels of species and strain, but may be used to build a customized database at any taxonomic level, thereby overcoming the limitations of using a restricted set of reference sequences. DisCVR utilizes <italic>k</italic>
-mer counts derived from an entire HTS dataset to detect the viruses present in a sample, and validates the results by showing the coverage and depth of reads mapping to a reference sequence.</p>
</sec>
<sec><title>2. Methods</title>
<sec><title>2.1 The <italic>k</italic>
-mer databases</title>
<p>A <italic>k</italic>
-mer is a short sequence of <italic>k</italic>
 nucleotides. A <italic>k</italic>
-mer dataset is generated iteratively by sliding a window of size <italic>k</italic>
 along a sequence one nucleotide at a time. Extracting <italic>k</italic>
-mers and counting their frequencies in a set of sequences can be computationally intensive, especially when <italic>k</italic>
 is large and the sequences are numerous. Dedicated <italic>k</italic>
-mer counting programs, such as Jellyfish (<xref rid="vez033-B15" ref-type="bibr">Marçais and Kingsford 2011</xref>
) and Khmer (<xref rid="vez033-B31" ref-type="bibr">Zhang et al. 2014</xref>
), can be incorporated into abundance-based tools in order to optimize speed. KAnalyze (<xref rid="vez033-B2" ref-type="bibr">Audano and Vannberg 2014</xref>
) was chosen for integration into DisCVR because the <italic>k</italic>
-mers it generates are sorted lexicographically, thus making the search for matches very efficient. KAnalyze also uses the canonical representation of a <italic>k</italic>
-mer, which is lexicographically the smaller of a <italic>k</italic>
-mer and its reverse complement. These features allow the program to work with 3 Gb RAM.</p>
<p>For the purpose of this study, we define a virus <italic>k</italic>
-mer as a <italic>k</italic>
-mer that uniquely represents a virus or set of related viruses, to the exclusion of the host. A shared <italic>k</italic>
-mer is defined as a <italic>k</italic>
-mer that is common to a virus and the host. By excluding shared <italic>k-</italic>
mers, it is not necessary for the user to remove host reads before using DisCVR, thus speeding up the overall processing time. If <italic>k</italic>
 is small, many copies of shared <italic>k</italic>
-mers are generated, and if <italic>k</italic>
 is large, many copies of virus <italic>k</italic>
-mers are found. Choosing the optimal <italic>k</italic>
-mer size depends on balancing the advantages of speed (small <italic>k</italic>
) with those of specificity and sensitivity (large <italic>k</italic>
). Furthermore, it is necessary to reduce the number of low-complexity <italic>k</italic>
-mers in the virus <italic>k</italic>
-mer database, as these may be repetitive in sequence and present in otherwise unrelated viruses. The filtering of low-complexity <italic>k</italic>
-mers and the selection of the size of <italic>k</italic>
 is explained in <xref ref-type="supplementary-material" rid="sup1">Supplementary Section S1 (Shannon 1948; Sims et al. 2009; Wu et al. 2009)</xref>
.</p>
<p>For constructing the virus <italic>k</italic>
-mer databases, three comprehensive datasets of complete or partial viral sequences were extracted from the NCBI taxonomy database. The first, the human hemorrhagic virus dataset (shortened below to ‘hemorrhagic dataset’), contained 33,367 sequences of the hemorrhagic fever viruses listed by the Centers for Disease Control and Prevention (Centers for Disease Control and Prevention, n.d.). The second, the human respiratory virus dataset (‘respiratory dataset’), contained 442,282 sequences of viruses associated with respiratory disease. The third, the human pathogenic virus dataset (‘pathogenic dataset’), consisted of 1,762,968 sequences of viruses identified in the UK Health and Safety Executive list of biological agents (<xref rid="vez033-B9" ref-type="bibr">Health and Safety Executive: The Approved List of Biological Agents 2013</xref>
).</p>
</sec>
<sec><title>2.2 Database build</title>
<p>DisCVR operates via three modules concerned with database build, sample classification and validation (<xref ref-type="fig" rid="vez033-F1">Fig. 1</xref>).
</p>
<fig id="vez033-F1" orientation="portrait" position="float"><label>Figure 1.</label>
<caption><p>DisCVR framework. Each colored box represents a component of the tool. Dashed rectangles indicate processes and solid rectangles show input and output.</p>
</caption>
<graphic xlink:href="vez033f1"></graphic>
</fig>
<p>Currently, the database build module includes three virus <italic>k</italic>
-mer databases, derived from the hemorrhagic, respiratory, and pathogenic datasets, for use in the sample classification module. In addition, some of the sequences in these datasets, defined largely by their presence in the NCBI RefSeq database, are used as a set of reference genome sequences in the validation module. DisCVR also allows the user to create customized databases and sets of reference sequences using the command-line utility scripts provided with the DisCVR distribution. The database build module involves selecting the relevant viral dataset, collecting the <italic>k</italic>
-mers, and removing those that are shared with the host or are of low complexity. Each remaining <italic>k</italic>
-mer is then identified with a taxonomic tag and an indication of the number of times it occurs in the sequences. The <italic>k</italic>
-mers are further subdivided into those that exist in a single virus (i.e. specific <italic>k</italic>
-mers) and those that exist in multiple viruses (i.e. nonspecific <italic>k</italic>
-mers). These assignments are made at the level of species and strain and are used in the output to illustrate the degree of specificity of the <italic>k</italic>
-mers matching a virus (<xref ref-type="fig" rid="vez033-F2">Fig. 2</xref>).
</p>
<fig id="vez033-F2" orientation="portrait" position="float"><label>Figure 2.</label>
<caption><p>DisCVR GUI. The top screenshot shows the scoring panel with the top three virus hits, and the bottom screenshot shows the full analysis.</p>
</caption>
<graphic xlink:href="vez033f2"></graphic>
</fig>
</sec>
<sec><title>2.3 Sample classification</title>
<p>To analyze an HTS dataset, the file is loaded into DisCVR via the GUI. The <italic>k</italic>
-mers are extracted and their frequencies are calculated, the single copy <italic>k</italic>
-mers, which are mainly attributed to sequencing errors (<xref rid="vez033-B14" ref-type="bibr">Manekar and Sathe 2018</xref>
), and low-complexity <italic>k</italic>
-mers, which commonly give confounding matches that have nothing to do with homology (<xref rid="vez033-B1" ref-type="bibr">Altschul et al. 1994</xref>
), are filtered out, and the remaining <italic>k</italic>
-mers are compared with the chosen virus <italic>k</italic>
-mer database. As the number of <italic>k</italic>
-mers in the sample can be enormous, various data structures were considered to optimize the classification on machines with limited RAM. Although searching the trie is fast O(<italic>n</italic>
), where <italic>n</italic>
 is the size of the <italic>k-</italic>
mer, it requires O(<italic>n</italic>
<sup>2</sup>
) overall time to build, and the space needed is quadratic. Instead, DisCVR uses a fast searching algorithm that groups similar <italic>k</italic>
-mers together. Briefly, the <italic>k</italic>
-mers in the virus database are divided among smaller sub-files according to the first five nucleotides. The same procedure is used to divide the <italic>k</italic>
-mers derived from the entire HTS dataset. Searching commences by loading the corresponding sub-files from the virus <italic>k</italic>
-mer database and the sample <italic>k</italic>
-mers into memory, and performing a binary search for the presence of each sample <italic>k</italic>
-mer among the database <italic>k</italic>
-mers. Only matched <italic>k</italic>
-mers are retrieved. Finally, DisCVR displays a straightforward list of all the virus hits detected, along with summary statistics and taxonomic information on the sample <italic>k</italic>
-mers (<xref ref-type="fig" rid="vez033-F2">Fig. 2</xref>
).</p>
</sec>
<sec><title>2.4 Validation</title>
<p>DisCVR helps the user to assess the significance of the findings by facilitating an examination of <italic>k</italic>
-mer distribution (allowing up to three mismatches) across a reference sequence representing the target genome. As an alternative, it also incorporates an examination of sequence read distribution carried out by using Tanoti (Sreenu, n.d.), which is a BLAST-guided, reference-based short read aligner that is particularly tolerant of mismatches. In each case, the output is a graph showing the depth and coverage of <italic>k</italic>
-mers or sequence reads across the reference genome and a summary of statistics for the mapping results (<xref ref-type="fig" rid="vez033-F3">Fig. 3</xref>).
</p>
<fig id="vez033-F3" orientation="portrait" position="float"><label>Figure 3.</label>
<caption><p>DisCVR validation. Coverage and depth of matched <italic>k</italic>
-mers (top) and reads (bottom) to a reference genome.</p>
</caption>
<graphic xlink:href="vez033f3"></graphic>
</fig>
</sec>
<sec><title>2.5 Accuracy</title>
<p>The respiratory database was used to analyze published RNA-seq data from nasopharyngeal swab samples (<italic>n</italic>
 = 89) that had been collected from adults with upper respiratory tract infections (<xref rid="vez033-B25" ref-type="bibr">Thorburn et al. 2015</xref>
) (<xref ref-type="supplementary-material" rid="sup1">Supplementary Table S2</xref>
; the average number of reads per sample was 660,640, range 30,872–1,278,122). The samples had been tested using a standard real-time PCR (RT-PCR) assay for human rhinovirus (HRV), influenza viruses A and B (IFA/IFB), respiratory syncytial virus (RSV), adenovirus (ADV), human metapneumovirus (hMPV), parainfluenza viruses (PIV) 1–4, and human coronaviruses (HCoV) HKU1, NL63, OC43 and 229E (<xref rid="vez033-B25" ref-type="bibr">Thorburn et al. 2015</xref>
). The top hit for each sample (i.e. the virus having the greatest number of distinct <italic>k</italic>
-mers) using DisCVR was compared with the virus detected previously by RT-PCR. The samples were also classified using three independent <italic>k-</italic>
mer-based programs that require command-line usage on a Linux operating system: Kraken (<xref rid="vez033-B28" ref-type="bibr">Wood and Salzberg 2014</xref>
), KrakenHLL (<xref rid="vez033-B5" ref-type="bibr">Breitwieser and Salzberg 2018</xref>
), and CLARK (<xref rid="vez033-B17" ref-type="bibr">Ounit et al. 2015</xref>
). As the pre-built database for Kraken only contains the RefSeq viral genomes (11,489 sequences), a more comprehensive <italic>k</italic>
-mer database was built for each program from the same 442,282 sequences in the respiratory dataset in order to standardize the results. This successfully accommodated within species sequence diversity, which is not normally taken into account using the pre-built database.</p>
<p>The initial objective was to determine the number of distinct <italic>k</italic>
-mers that would maximize both sensitivity (effectiveness in identifying samples containing viruses) and specificity (effectiveness in identifying samples lacking viruses) for DisCVR. The output of DisCVR was categorized on the basis of the number of distinct <italic>k</italic>
-mers for the top hit, and that of the other programs was assessed on the basis of the number of reads assigned to the top hit. For each tool, sensitivity and specificity were defined as TP/(TP + FN) and TN/(TN + FP), respectively, where TP, FN, TN, and FP are the number of true positive, false negative, true negative, and false positive samples relative to the RT-PCR results. We define samples as (1) true positive when the top virus hit was detected by both RT-PCR and DisCVR, (2) true negative when neither RT-PCR nor DisCVR detected a virus, (3) false negative when a virus was detected by RT-PCR but not by DisCVR, and (4) false positive when a virus was detected by DisCVR but not by RT-PCR. Receiver Operating Characteristics (ROC) curves were generated for DisCVR, Kraken, KrakenHLL and CLARK using the pROC package in R and Youden’s statistic (<xref rid="vez033-B30" ref-type="bibr">Youden 1950</xref>
).</p>
</sec>
<sec><title>2.6 Application</title>
<p>DisCVR was used to analyze 177 HTS RNA-seq libraries derived from serum specimens collected in Nigeria from healthy individuals (<italic>n</italic>
 = 120) and patients with unexplained acute febrile illness (<italic>n</italic>
 = 57) and analyzed in a previous study (<xref rid="vez033-B24" ref-type="bibr">Stremlau et al. 2015</xref>
). The raw data were downloaded from SRA BioProject PRJNA271229. The top hit using DisCVR was compared with the viral reads identified using BLASTn and BLASTx in the original study (<ext-link ext-link-type="doi" xlink:href="10.1371/journal.pntd.0003631.s017">https://doi.org/10.1371/journal.pntd.0003631.s017</ext-link>
).</p>
</sec>
</sec>
<sec><title>3. Results</title>
<p>The ROC curve (<xref ref-type="fig" rid="vez033-F4">Fig. 4</xref>
) derived from the datasets from respiratory tract infections (<xref rid="vez033-B25" ref-type="bibr">Thorburn et al. 2015</xref>
) compares the sensitivity and specificity for different <italic>k-</italic>
mer thresholds. It suggests that a value of 850 <italic>k</italic>
-mers is the optimal threshold on the basis of the point on the curve furthest from the identity (diagonal) line (<xref ref-type="supplementary-material" rid="sup1">Supplementary Table S2</xref>
). The ROC curves of DisCVR and the other programs (<xref ref-type="fig" rid="vez033-F4">Fig. 4</xref>
) did not differ significantly from each other, and had overlapping confidence intervals. Kraken and KrakenHLL had identical curves. Kraken and CLARK rated as slightly more sensitive but less specific than DisCVR as a result of HCoV NL63 being the top hit in sample 1D3 and the second hit in DisCVR (<xref rid="vez033-T1" ref-type="table">Table 1</xref>
; <xref ref-type="supplementary-material" rid="sup1">Supplementary Table S2</xref>
). The top hit in DisCVR was HRV-A, which was the second hit in Kraken and CLARK but was not detected using RT-PCR. It was not informative to compare average execution time and memory usage for the programs, as it is not possible to run CLARK, Kraken, and KrakenHLL natively on Windows operating systems. Also, on a Linux operating systems CLARK and Kraken required more than 30 Gb of RAM to run samples against the respiratory dataset, whereas DisCVR ran with only 8 Gb.</p>
<fig id="vez033-F4" orientation="portrait" position="float"><label>Figure 4.</label>
<caption><p>ROC curve showing the accuracy of DisCVR, CLARK and Kraken<italic>.</italic>
 The transparent shaded area shows the confidence interval of the sensitivity for all three methods. The optimal threshold of 850 <italic>k-</italic>
mers for DisCVR and 150 reads for CLARK and Kraken are shown, with bars representing the confidence interval of the threshold and the specificity and sensitivity shown in brackets. The curve for KrakenHLL was identical to that for Kraken. The diamond indicates the sensitivity and specificity values, counting the false positives with ≥850 <italic>k</italic>
-mers and the second hits with ≥850 <italic>k</italic>
-mers among the true positives for DisCVR, CLARK, and Kraken.</p>
</caption>
<graphic xlink:href="vez033f4"></graphic>
</fig>
<table-wrap id="vez033-T1" orientation="portrait" position="float"><label>Table 1.</label>
<caption><p>Results of the second hits in the respiratory samples.</p>
</caption>
<table frame="hsides" rules="groups"><colgroup span="1"><col valign="top" align="left" span="1"></col>
<col valign="top" align="center" span="1"></col>
<col valign="top" align="left" span="1"></col>
<col valign="top" align="left" span="1"></col>
</colgroup>
<thead><tr><th rowspan="1" colspan="1">Sample</th>
<th rowspan="1" colspan="1">RT-PCR diagnosis</th>
<th rowspan="1" colspan="1">DisCVR top hit and (no.)<xref ref-type="table-fn" rid="tblfn1"><sup>a</sup>
</xref>
</th>
<th rowspan="1" colspan="1">DisCVR second hit and (no.)<xref ref-type="table-fn" rid="tblfn1"><sup>a</sup>
</xref>
</th>
</tr>
</thead>
<tbody><tr><td colspan="4" rowspan="1">Top hit with ≤850 k-mers matching</td>
</tr>
<tr><td rowspan="1" colspan="1">   1G2</td>
<td rowspan="1" colspan="1">PIV-3</td>
<td rowspan="1" colspan="1">PIV-3 (366)</td>
<td rowspan="1" colspan="1">HRV-A (149)</td>
</tr>
<tr><td rowspan="1" colspan="1">   1I5</td>
<td rowspan="1" colspan="1">HRV</td>
<td rowspan="1" colspan="1">HRV-A (749)</td>
<td rowspan="1" colspan="1">HRV-C (470)</td>
</tr>
<tr><td rowspan="1" colspan="1">   2B6</td>
<td rowspan="1" colspan="1">RSV</td>
<td rowspan="1" colspan="1">RSV (742)</td>
<td rowspan="1" colspan="1">IFA H3N2 (262)</td>
</tr>
<tr><td colspan="4" rowspan="1">Second hit with ≥850 k-mers matching</td>
</tr>
<tr><td rowspan="1" colspan="1">   1B5</td>
<td rowspan="1" colspan="1">PIV-3</td>
<td rowspan="1" colspan="1"><bold>HRV-A (3,758)</bold>
</td>
<td rowspan="1" colspan="1"><bold>PIV-3 (3,111)</bold>
</td>
</tr>
<tr><td rowspan="1" colspan="1">   1D3</td>
<td rowspan="1" colspan="1">HCoV NL63</td>
<td rowspan="1" colspan="1"><bold>HRV-A (2,420)</bold>
</td>
<td rowspan="1" colspan="1"><bold>HCoV NL63 (1,841)</bold>
</td>
</tr>
<tr><td colspan="4" rowspan="1">Second hit with ≤850 k-mers matching</td>
</tr>
<tr><td rowspan="1" colspan="1">   1C2</td>
<td rowspan="1" colspan="1">HRV</td>
<td rowspan="1" colspan="1"><bold>Enterovirus D (1,633)</bold>
</td>
<td rowspan="1" colspan="1">HRV-A (269)</td>
</tr>
<tr><td rowspan="1" colspan="1">   1E5</td>
<td rowspan="1" colspan="1">RSV</td>
<td rowspan="1" colspan="1"><bold>HRV-C (1,777)</bold>
</td>
<td rowspan="1" colspan="1">RSV (415)</td>
</tr>
<tr><td rowspan="1" colspan="1">   1F8</td>
<td rowspan="1" colspan="1">HCoV NL63</td>
<td rowspan="1" colspan="1"><bold>HRV-B (3,876)</bold>
</td>
<td rowspan="1" colspan="1">HCoV NL63 (724)</td>
</tr>
<tr><td rowspan="1" colspan="1">   2B9</td>
<td rowspan="1" colspan="1">HRV</td>
<td rowspan="1" colspan="1"><bold>RSV (1,105)</bold>
</td>
<td rowspan="1" colspan="1">HRV- C (94)</td>
</tr>
<tr><td rowspan="1" colspan="1">   2A2</td>
<td rowspan="1" colspan="1">HCoV 229E</td>
<td rowspan="1" colspan="1">HRV-C (770)</td>
<td rowspan="1" colspan="1">HCoV 229E (176)</td>
</tr>
<tr><td rowspan="1" colspan="1">   2C4</td>
<td rowspan="1" colspan="1">HCoV 229E</td>
<td rowspan="1" colspan="1">HRV-A (264)</td>
<td rowspan="1" colspan="1">HCoV 229E (5)</td>
</tr>
<tr><td rowspan="1" colspan="1">   2D3</td>
<td rowspan="1" colspan="1">HCoV OC43</td>
<td rowspan="1" colspan="1">HRV-A (438)</td>
<td rowspan="1" colspan="1">HCoV OC43 (135)</td>
</tr>
<tr><td rowspan="1" colspan="1">   1F7</td>
<td rowspan="1" colspan="1">HRV</td>
<td rowspan="1" colspan="1">hMPV (27)</td>
<td rowspan="1" colspan="1">HRV-B (20)</td>
</tr>
<tr><td rowspan="1" colspan="1">   1G1</td>
<td rowspan="1" colspan="1">ADV/HRV</td>
<td rowspan="1" colspan="1">HCoV OC43 (163)</td>
<td rowspan="1" colspan="1">HRV-B (118)</td>
</tr>
<tr><td colspan="4" rowspan="1">Not detected</td>
</tr>
<tr><td rowspan="1" colspan="1">   1C9</td>
<td rowspan="1" colspan="1">hMPV</td>
<td rowspan="1" colspan="1"><bold>HRV-A (3,083)</bold>
</td>
<td rowspan="1" colspan="1">Enterovirus D (7)</td>
</tr>
<tr><td rowspan="1" colspan="1">   2D4</td>
<td rowspan="1" colspan="1">PIV-2</td>
<td rowspan="1" colspan="1">HRV-A (579)</td>
<td rowspan="1" colspan="1">HCoV OC43 (225)</td>
</tr>
</tbody>
</table>
<table-wrap-foot><fn id="tblfn1"><p><sup>a</sup>
Number of <italic>k</italic>
-mers matching the classification. Hits with ≥850 k-mers are shown in bold.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>A total of 48/89 (54%) of the samples had been shown to contain viruses by RT-PCR, and the remaining 41/89 lacked all viruses tested. Considering only the samples in the set of eighty-nine for which DisCVR identified ≥850 <italic>k</italic>
-mers for the top hit, the following findings were made. DisCVR identified the viruses that were detected by RT-PCR in 32/48 (67%) of samples (true positives). It did not detect viruses in samples in which no viruses had been found by RT-PCR in 22/41 (54%) of samples (true negatives). It detected viruses in samples in which no viruses had been detected by RT-PCR in 19/41 (46%) of samples (false positives), and either detected viruses that did not correspond with those detected by RT-PCR or did not find any virus with ≥850 <italic>k</italic>
-mers in 16/48 (33%) of samples (false negatives).</p>
<p>The RT-PCR assay was limited by the range of viruses that it could detect, by its dependence on sequence conservation, and consequently also by its potential to identify infections by multiple viruses. Consequently, the false positive results were assessed using the validation module (<xref rid="vez033-T2" ref-type="table">Table 2</xref>
), and the false negative results were investigated by examining the second hits recorded by DisCVR (<xref rid="vez033-T1" ref-type="table">Table 1</xref>
). In most false positive cases, the validation module showed that there were multiple reads mapping (mean = 98 ± 73 reads) to several regions of the reference genome (mean = 6 ± 1% coverage of sites), thus confirming the presence of the viruses identified by DisCVR even though they had not been detected by RT-PCR. Some samples had low coverage because a single RefSeq sequence in the validation represented the entire species but diverged in sequence from the virus present in the sample. For example, sample 1B3 yielded HRV-A89 (the reference for species <italic>Rhinovirus A</italic>
) as the top hit, with only 7.6 per cent genome coverage and four mapped reads. Using the capability of DisCVR to build a customized database drawn from the ≥100 prototypic strains of <italic>Rhinovirus A</italic>
, HRV-A49 was revealed as the top hit, with 81.71 per cent genome coverage and 263 mapped reads. This dramatic improvement illustrates the potential to strengthen the validation module by adding user-specific curated sets of sequences or by the proposed expansion of RefSeq entries capturing a greater degree of diversity (<xref rid="vez033-B6" ref-type="bibr">Brister et al. 2015</xref>
). In the sixteen false negative cases, DisCVR detected the virus identified by RT-PCR as the top hit in three samples (1G2, 1I5, and 2B6), but the number of distinct <italic>k</italic>
-mers was <850 (<xref rid="vez033-T1" ref-type="table">Table 1</xref>
; <xref ref-type="supplementary-material" rid="sup1">Supplementary Table S2</xref>
). In addition, the virus identified by RT-PCR was detected as the second hit in 10 samples (1B5, 1D3, 1E5, 1G1, 1F7, 1F8, 2A2, 2B9, 2C4, and 2D3), and, in one case (1C2), the RT-PCR assay did not have the potential of identifying the top hit (enterovirus D). An important finding was made in two of these samples (1B5 and 1D3), in which the viruses detected by RT-PCR were not the top hits but still had ≥850 distinct <italic>k</italic>
-mers in the sample (<xref rid="vez033-T1" ref-type="table">Table 1</xref>
). This suggests that these patients were infected by multiple viruses. Finally, DisCVR did not detect any <italic>k</italic>
-mers for the virus detected by RT-PCR in two samples (1C9 and 2D4), but identified HRV-A in 1C9, which was validated by reference assembly. The validation module thus yielded strong evidence for the presence of the viruses detected by DisCVR, at least where the number of <italic>k</italic>
-mers was ≥850. These findings were taken into account in reassessing the sensitivity and specificity of DisCVR at 79 and 100 per cent, respectively (<xref ref-type="fig" rid="vez033-F4">Fig. 4</xref>
). The optimal threshold for CLARK and Kraken based on the ROC curves suggests 150 reads as the threshold. Recalculating the sensitivity and specificity based on this threshold gave values of 70.7 and 91.7 per cent for CLARK and 68.7 and 76 per cent for Kraken.</p>
<table-wrap id="vez033-T2" orientation="portrait" position="float"><label>Table 2.</label>
<caption><p>Coverage of reference genomes of the top hits detected in false positive samples in the respiratory samples.</p>
</caption>
<table frame="hsides" rules="groups"><colgroup span="1"><col valign="top" align="left" span="1"></col>
<col valign="top" align="left" span="1"></col>
<col valign="top" align="char" char="." span="1"></col>
<col valign="top" align="char" char="." span="1"></col>
<col valign="top" align="char" char="(" span="1"></col>
</colgroup>
<thead><tr><th rowspan="1" colspan="1">Sample</th>
<th rowspan="1" colspan="1">Virus detected by DisCVR</th>
<th rowspan="1" colspan="1">Matched <italic>k</italic>
-mers<xref ref-type="table-fn" rid="tblfn2"><sup>a</sup>
</xref>
</th>
<th rowspan="1" colspan="1">Genome coverage (%)</th>
<th rowspan="1" colspan="1">No. mapped reads (%)<xref ref-type="table-fn" rid="tblfn3"><sup>b</sup>
</xref>
</th>
</tr>
</thead>
<tbody><tr><td rowspan="1" colspan="1">1B3</td>
<td rowspan="1" colspan="1">HRV-A</td>
<td rowspan="1" colspan="1">3,431</td>
<td rowspan="1" colspan="1">7.6</td>
<td rowspan="1" colspan="1">4 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">1B4</td>
<td rowspan="1" colspan="1">HRV-A</td>
<td rowspan="1" colspan="1">3,652</td>
<td rowspan="1" colspan="1">9.39</td>
<td rowspan="1" colspan="1">14 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">1B6</td>
<td rowspan="1" colspan="1">HRV-A</td>
<td rowspan="1" colspan="1">2,872</td>
<td rowspan="1" colspan="1">6.38</td>
<td rowspan="1" colspan="1">16 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">1B9</td>
<td rowspan="1" colspan="1">HRV-A</td>
<td rowspan="1" colspan="1">1,041</td>
<td rowspan="1" colspan="1">2.15</td>
<td rowspan="1" colspan="1">1,404 (0.10)</td>
</tr>
<tr><td rowspan="1" colspan="1">1C8</td>
<td rowspan="1" colspan="1">HRV-A</td>
<td rowspan="1" colspan="1">2,781</td>
<td rowspan="1" colspan="1">8.21</td>
<td rowspan="1" colspan="1">8 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">1D2</td>
<td rowspan="1" colspan="1">HRV-A</td>
<td rowspan="1" colspan="1">2,974</td>
<td rowspan="1" colspan="1">9.38</td>
<td rowspan="1" colspan="1">13 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">1D5</td>
<td rowspan="1" colspan="1">HRV-C</td>
<td rowspan="1" colspan="1">901</td>
<td rowspan="1" colspan="1">3.63</td>
<td rowspan="1" colspan="1">8 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">1D6</td>
<td rowspan="1" colspan="1">HRV-C</td>
<td rowspan="1" colspan="1">1,103</td>
<td rowspan="1" colspan="1">3.27</td>
<td rowspan="1" colspan="1">5 (0.99)</td>
</tr>
<tr><td rowspan="1" colspan="1">1E2</td>
<td rowspan="1" colspan="1">HRV-C</td>
<td rowspan="1" colspan="1">1,299</td>
<td rowspan="1" colspan="1">1.51</td>
<td rowspan="1" colspan="1">1 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">1E4</td>
<td rowspan="1" colspan="1">HRV-C</td>
<td rowspan="1" colspan="1">1,813</td>
<td rowspan="1" colspan="1">4.8</td>
<td rowspan="1" colspan="1">7 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">1E9</td>
<td rowspan="1" colspan="1">HRV-B</td>
<td rowspan="1" colspan="1">4,306</td>
<td rowspan="1" colspan="1">13.69</td>
<td rowspan="1" colspan="1">27 (0.01)</td>
</tr>
<tr><td rowspan="1" colspan="1">1G7</td>
<td rowspan="1" colspan="1">HRV-B</td>
<td rowspan="1" colspan="1">1,447</td>
<td rowspan="1" colspan="1">1.76</td>
<td rowspan="1" colspan="1">5 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">1H5</td>
<td rowspan="1" colspan="1">HRV-B</td>
<td rowspan="1" colspan="1">932</td>
<td rowspan="1" colspan="1">3.84</td>
<td rowspan="1" colspan="1">4 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">1I7</td>
<td rowspan="1" colspan="1">HRV-C</td>
<td rowspan="1" colspan="1">1,234</td>
<td rowspan="1" colspan="1">1.51</td>
<td rowspan="1" colspan="1">1 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">1I9</td>
<td rowspan="1" colspan="1">HRV-C</td>
<td rowspan="1" colspan="1">1,845</td>
<td rowspan="1" colspan="1">3.1</td>
<td rowspan="1" colspan="1">9 (0.00)</td>
</tr>
<tr><td rowspan="1" colspan="1">2A1</td>
<td rowspan="1" colspan="1">RSV</td>
<td rowspan="1" colspan="1">2,123</td>
<td rowspan="1" colspan="1">13.37</td>
<td rowspan="1" colspan="1">172 (0.02)</td>
</tr>
<tr><td rowspan="1" colspan="1">2B5</td>
<td rowspan="1" colspan="1">RSV</td>
<td rowspan="1" colspan="1">927</td>
<td rowspan="1" colspan="1">13.56</td>
<td rowspan="1" colspan="1">69 (0.01)</td>
</tr>
<tr><td rowspan="1" colspan="1">2B8</td>
<td rowspan="1" colspan="1">RSV</td>
<td rowspan="1" colspan="1">1,406</td>
<td rowspan="1" colspan="1">8.64</td>
<td rowspan="1" colspan="1">101 (0.01)</td>
</tr>
<tr><td rowspan="1" colspan="1">2D1</td>
<td rowspan="1" colspan="1">HRV-C</td>
<td rowspan="1" colspan="1">1,620</td>
<td rowspan="1" colspan="1">1.59</td>
<td rowspan="1" colspan="1">2 (0.00)</td>
</tr>
</tbody>
</table>
<table-wrap-foot><fn id="tblfn2"><label>a</label>
<p>Number of matching <italic>k</italic>
-mers identified by the classification module.</p>
</fn>
<fn id="tblfn3"><label>b</label>
<p>Percentage of total reads mapped by the validation module.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>The threshold of 850 <italic>k</italic>
-mers was also used in the analysis of the Nigerian datasets (<xref rid="vez033-B24" ref-type="bibr">Stremlau et al. 2015</xref>
). The top hit from DisCVR was the same as that from the BLAST results in the original study for 101/177 (57%) cases, and viruses were detected in both healthy (<italic>n</italic>
 = 68) and febrile (<italic>n</italic>
 = 33) patients (<xref ref-type="supplementary-material" rid="sup1">Supplementary Table S5</xref>
). In nine cases, the top hit from DisCVR differed from the top BLAST hit, but the second hit matched. In fifty-five cases, the number of <italic>k-</italic>
mers was below the threshold in DisCVR, and the number of reads with BLAST matches was also low (an average of twenty-four reads per dataset). In the remaining twelve discordant samples, DisCVR detected human immunodeficiency virus 1 (<italic>n</italic>
 = 9), XMRV-related virus (<italic>n</italic>
 = 1), and human T-lymphotropic virus 1 (<italic>n</italic>
 = 1) as the top hit, whereas the BLAST results supported the presence of human ADV or Heterosigma akashiwo RNA virus (an algal virus). Mapping of reads to reference genomes suggested that the DisCVR and BLAST hits are false positives.</p>
</sec>
<sec><title>4. Discussion</title>
<p>Using HTS in diagnostic settings offers many advantages, including the ability to sequence pathogen genomes both individually and as communities. However, the uptake of HTS in such settings has been slow, due partly to the cost, turnover time and bioinformatic demands of this technology. We developed DisCVR to help address these challenges. DisCVR is a fast, accurate program for detecting viruses from HTS data using the increasingly exploited approach of <italic>k</italic>
-mer classification. It offers the advantage of a non-targeted approach and also enables typing below the species level (e.g. subtype, serotype, genotype, or strain). Unlike other tools for detecting viruses from HTS data, DisCVR is easy to use in diagnostic settings through the GUI, requires no bioinformatic expertise, and can be used on the Windows operating systems that are commonly used in diagnostic laboratories. The basic output is easy to interpret, and the advanced output provides more detailed statistics and a validation capability.</p>
<p>DisCVR was designed for detecting known viruses and cannot be used to discover novel viruses. Indeed, the paper on the Nigerian patients (<xref rid="vez033-B24" ref-type="bibr">Stremlau et al. 2015</xref>
) reported novel rhabdoviruses in healthy patients using a metagenomic approach, and these were not detected by DisCVR. However, metagenomics requires bioinformatic infrastructure and expertise at levels that are not commonly available in diagnostic laboratories. Nonetheless, DisCVR enables the detection of 148 pathogenic human viruses using one of the three implemented datasets (the pathogenic dataset), and more using the others. This represents a greater than ten-fold increase in target species over multiplex RT-PCR. Moreover, the number of viruses incorporated into the DisCVR databases is flexible and can also be expanded by building custom databases.</p>
<p>In the datasets from respiratory tract infections, DisCVR had high sensitivity and specificity levels but did not identify all the viruses detected by RT-PCR when the threshold of ≥850 <italic>k</italic>
-mers was used. This threshold may be set by the user and was calculated for the respiratory dataset for which we had paired RT-PCR and HTS data. As more datasets with paired information become available, it will be possible to tune the threshold more accurately to specific sample types and sizes. For example, the coverage depth of sequencing data is likely to play an important role in the threshold of detection. Further efforts could also be made to calibrate DisCVR from artificially constructed communities of viruses in various proportions.</p>
<p>Finally, DisCVR is configured as a human viral diagnostic tool, but could be readily expanded to include non-viral human pathogens and pathogens with non-human hosts by using the custom-build scripts in the DisCVR distribution.</p>
</sec>
<sec sec-type="supplementary-material"><title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="sup1"><label>vez033_Supplementary_Data</label>
<media xlink:href="vez033_supplementary_data.zip"><caption><p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back><ack id="A1"><title>Acknowledgements</title>
<p>We thank the members of the Viral Genomics and Bioinformatics Group for continuous, insightful feedback on DisCVR, and in particular Sejal Modha for generous support with utilizing NCBI tools and testing DisCVR. We are grateful to Jonathan Audet, Daniel Todt, Salvatore Camiolo, Quan Gu, Thushan Da Silva, Derek Gatherer and the anonymous beta-testers. We also thank David Manlove for advice on algorithm design. This work was funded by the Medical Research Council (MC_UU_12014/12). Pablo R Murcia is funded by Grant MC_UU_12014/9</p>
<sec sec-type="data-availability"><title>Data availability</title>
<p>Source code is available on github <ext-link ext-link-type="uri" xlink:href="https://centre-for-virus-research.github.io/DisCVR/">https://centre-for-virus-research.github.io/DisCVR/</ext-link>
 and databases and executables are available on <ext-link ext-link-type="uri" xlink:href="http://bioinformatics.cvr.ac.uk/discvr.php">http://bioinformatics.cvr.ac.uk/discvr.php</ext-link>
.</p>
<p><bold>Conflict of interest:</bold>
 None declared.</p>
</sec>
</ack>
<ref-list id="R1"><title>References</title>
<ref id="vez033-B1"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Altschul</surname>
<given-names>S. F.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>1994</year>
) ‘<article-title>Issues in Searching Molecular Sequence Databases</article-title>
’, <source>Nature Genetics</source>
, <volume>6</volume>
: <fpage>119</fpage>
–<lpage>29</lpage>
.<pub-id pub-id-type="pmid">8162065</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B2"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Audano</surname>
<given-names>P.</given-names>
</name>
, <name name-style="western"><surname>Vannberg</surname>
<given-names>F.</given-names>
</name>
</person-group>
 (<year>2014</year>
) ‘<article-title>KAnalyze: A Fast Versatile Pipelined k-Mer Toolkit</article-title>
’, <source>Bioinformatics</source>
, <volume>30</volume>
: <fpage>2070</fpage>
–<lpage>2</lpage>
.<pub-id pub-id-type="pmid">24642064</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B3"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Borozan</surname>
<given-names>I.</given-names>
</name>
, <name name-style="western"><surname>Ferretti</surname>
<given-names>V.</given-names>
</name>
</person-group>
 (<year>2016</year>
) ‘<article-title>CSSSCL: A Python Package That Uses Combined Sequence Similarity Scores for Accurate Taxonomic Classification of Long and Short Sequence Reads</article-title>
’, <source>Bioinformatics</source>
, <volume>32</volume>
: <fpage>453</fpage>
–<lpage>5</lpage>
.<pub-id pub-id-type="pmid">26454281</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B4"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Borozan</surname>
<given-names>I.</given-names>
</name>
, <name name-style="western"><surname>Watt</surname>
<given-names>S.</given-names>
</name>
, <name name-style="western"><surname>Ferretti</surname>
<given-names>V.</given-names>
</name>
</person-group>
 (<year>2015</year>
) ‘<article-title>Integrating Alignment-Based and Alignment-Free Sequence Similarity Measures for Biological Sequence Classification</article-title>
’, <source>Bioinformatics</source>
, <volume>31</volume>
: <fpage>1396</fpage>
–<lpage>404</lpage>
.<pub-id pub-id-type="pmid">25573913</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B5"><mixed-citation publication-type="other"><person-group person-group-type="author"><name name-style="western"><surname>Breitwieser</surname>
<given-names>F. P.</given-names>
</name>
, <name name-style="western"><surname>Salzberg</surname>
<given-names>S. L.</given-names>
</name>
</person-group>
 (<year>2018</year>
) ‘KrakenHLL: Confident and fast metagenomics classification using unique k-mer counts’, <italic>bioRxiv</italic>
. </mixed-citation>
</ref>
<ref id="vez033-B6"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Brister</surname>
<given-names>J. R.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2015</year>
) ‘<article-title>NCBI Viral Genomes Resource</article-title>
’, <source>Nucleic Acids Research</source>
, <volume>43/Database issue</volume>
: <fpage>D571</fpage>
–<lpage>7</lpage>
.</mixed-citation>
</ref>
<ref id="vez033-B7"><mixed-citation publication-type="other"><collab>Centers for Disease Control and Prevention</collab>
. (n.d.), <<ext-link ext-link-type="uri" xlink:href="https://www.cdc.gov/vhf/index.html">https://www.cdc.gov/vhf/index.html</ext-link>
> accessed 15 Dec 2014. </mixed-citation>
</ref>
<ref id="vez033-B8"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Flygare</surname>
<given-names>S.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2016</year>
) ‘<article-title>Taxonomer: An Interactive Metagenomics Analysis Portal for Universal Pathogen Detection and Host mRNA Expression Profiling</article-title>
’, <source>Genome Biology</source>
, <volume>17</volume>
: <fpage>111</fpage>
.<pub-id pub-id-type="pmid">27224977</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B9"><mixed-citation publication-type="other"><collab>Health and Safety Executive: The Approved List of Biological Agents</collab>
. (<year>2013</year>
) <<ext-link ext-link-type="uri" xlink:href="http://www.hse.gov.uk/pubns/misc208.pdf">http://www.hse.gov.uk/pubns/misc208.pdf</ext-link>
> accessed 15 Dec 2014. </mixed-citation>
</ref>
<ref id="vez033-B10"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Kawulok</surname>
<given-names>J.</given-names>
</name>
, <name name-style="western"><surname>Deorowicz</surname>
<given-names>S.</given-names>
</name>
</person-group>
 (<year>2015</year>
) ‘<article-title>CoMeta: Classification of Metagenomes Using k-Mers</article-title>
’, <source>PLoS One</source>
, <volume>10</volume>
: <fpage>e0121453</fpage>
.<pub-id pub-id-type="pmid">25884504</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B11"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Koslicki</surname>
<given-names>D.</given-names>
</name>
, <name name-style="western"><surname>Falush</surname>
<given-names>D.</given-names>
</name>
</person-group>
 (<year>2016</year>
) ‘<article-title>MetaPalette: A k-Mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation</article-title>
’, <source>mSystems</source>
, <volume>1</volume>
, DOI: 10.1128/mSystems.00020-16 </mixed-citation>
</ref>
<ref id="vez033-B12"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Li</surname>
<given-names>Y.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2016</year>
) <article-title>‘VIP: An Integrated Pipeline for Metagenomics of Virus Identification and Discovery’</article-title>
, <source>Scientific Reports</source>
, <volume>6</volume>
: <fpage>23774</fpage>
.<pub-id pub-id-type="pmid">27026381</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B13"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Maarala</surname>
<given-names>A. I.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2018</year>
) ‘<article-title>ViraPipe: Scalable Parallel Pipeline for Viral Metagenome Analysis from Next Generation Sequencing Reads</article-title>
’, <source>Bioinformatics</source>
, <volume>34</volume>
: <fpage>928</fpage>
–<lpage>35</lpage>
.<pub-id pub-id-type="pmid">29106455</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B14"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Manekar</surname>
<given-names>S. C.</given-names>
</name>
, <name name-style="western"><surname>Sathe</surname>
<given-names>S. R.</given-names>
</name>
</person-group>
 (<year>2018</year>
) ‘<article-title>A Benchmark Study of k-Mer Counting Methods for High-Throughput Sequencing</article-title>
’, <source>GigaScience</source>
, 1: giy125.</mixed-citation>
</ref>
<ref id="vez033-B15"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Marçais</surname>
<given-names>G.</given-names>
</name>
, <name name-style="western"><surname>Kingsford</surname>
<given-names>C.</given-names>
</name>
</person-group>
 (<year>2011</year>
) ‘<article-title>A Fast, Lock-Free Approach for Efficient Parallel Counting of Occurrences of k-Mers</article-title>
’, <source>Bioinformatics</source>
, <volume>27</volume>
: <fpage>764</fpage>
–<lpage>70</lpage>
.<pub-id pub-id-type="pmid">21217122</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B16"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Orton</surname>
<given-names>R. J.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2016</year>
) ‘<article-title>Bioinformatics Tools for Analysing Viral Genomic Data</article-title>
’, <source>Revue Scientifique et Technique de L'oie</source>
, <volume>35</volume>
: <fpage>271</fpage>
–<lpage>85</lpage>
.</mixed-citation>
</ref>
<ref id="vez033-B17"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Ounit</surname>
<given-names>R.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2015</year>
) ‘<article-title>CLARK: Fast and Accurate Classification of Metagenomic and Genomic Sequences Using Discriminative k-Mers</article-title>
’, <source>BMC Genomics</source>
, <volume>16</volume>
: <fpage>236</fpage>
.<pub-id pub-id-type="pmid">25879410</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B18"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Ren</surname>
<given-names>J.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2017</year>
) ‘<article-title>VirFinder: A Novel k-Mer Based Tool for Identifying Viral Sequences from Assembled Metagenomic Data</article-title>
’, <source>Microbiome</source>
, <volume>5</volume>
: <fpage>69</fpage>
.<pub-id pub-id-type="pmid">28683828</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B19"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Rosen</surname>
<given-names>G. L.</given-names>
</name>
, <name name-style="western"><surname>Reichenberger</surname>
<given-names>E. R.</given-names>
</name>
, <name name-style="western"><surname>Rosenfeld</surname>
<given-names>A. M.</given-names>
</name>
</person-group>
 (<year>2011</year>
) ‘<article-title>NBC: The Naive Bayes Classification Tool Webserver for Taxonomic Classification of Metagenomic Reads</article-title>
’, <source>Bioinformatics</source>
, <volume>27</volume>
: <fpage>127</fpage>
–<lpage>9</lpage>
.<pub-id pub-id-type="pmid">21062764</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B20"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Scheuch</surname>
<given-names>M.</given-names>
</name>
, <name name-style="western"><surname>Höper</surname>
<given-names>D.</given-names>
</name>
, <name name-style="western"><surname>Beer</surname>
<given-names>M.</given-names>
</name>
</person-group>
 (<year>2015</year>
) ‘<article-title>RIEMS: A Software Pipeline for Sensitive and Comprehensive Taxonomic Classification of Reads From Metagenomics Datasets</article-title>
’, <source>BMC Bioinformatics</source>
, <volume>16</volume>
: <fpage>69</fpage>
.<pub-id pub-id-type="pmid">25886935</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B21"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Shannon</surname>
<given-names>C. E.</given-names>
</name>
</person-group>
 (<year>1948</year>
) ‘<article-title>A Mathematical Theory of Communication</article-title>
’, <source>Bell System Technical Journal</source>
, <volume>27</volume>
: <fpage>379</fpage>
–<lpage>423</lpage>
.</mixed-citation>
</ref>
<ref id="vez033-B22"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Sims</surname>
<given-names>G. E.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2009</year>
) ‘<article-title>Alignment-Free Genome Comparison With Feature Frequency Profiles (FFP) and Optimal Resolutions</article-title>
’, <source>Proceedings of the National Academy of Sciences</source>
, <volume>106</volume>
: <fpage>2677</fpage>
–<lpage>82</lpage>
.</mixed-citation>
</ref>
<ref id="vez033-B23"><mixed-citation publication-type="other"><person-group person-group-type="author"><name name-style="western"><surname>Sreenu</surname>
<given-names>V. B.</given-names>
</name>
</person-group>
 (n.d.) ‘Tanoti,’ <<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.cvr.ac.uk/tanoti.php">http://bioinformatics.cvr.ac.uk/tanoti.php</ext-link>
> accessed 15 Dec 2014. </mixed-citation>
</ref>
<ref id="vez033-B24"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Stremlau</surname>
<given-names>M. H.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2015</year>
) ‘<article-title>Discovery of Novel Rhabdoviruses in the Blood of Healthy Individuals from West Africa</article-title>
’, <source>PLoS Neglected Tropical Diseases</source>
, <volume>9</volume>
: <fpage>e0003631</fpage>
.<pub-id pub-id-type="pmid">25781465</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B25"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Thorburn</surname>
<given-names>F.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2015</year>
) ‘<article-title>The Use of Next Generation Sequencing in the Diagnosis and Typing of Respiratory Infections</article-title>
’, <source>Journal of Clinical Virology</source>
, <volume>69</volume>
: <fpage>96</fpage>
–<lpage>100</lpage>
.<pub-id pub-id-type="pmid">26209388</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B26"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Visser</surname>
<given-names>M.</given-names>
</name>
, <name name-style="western"><surname>Burger</surname>
<given-names>J. T.</given-names>
</name>
, <name name-style="western"><surname>Maree</surname>
<given-names>H. J.</given-names>
</name>
</person-group>
 (<year>2016</year>
) ‘<article-title>Targeted Virus Detection in Next-Generation Sequencing Data Using an Automated e-Probe Based Approach</article-title>
’, <source>Virology</source>
, <volume>495</volume>
: <fpage>122</fpage>
–<lpage>8</lpage>
.<pub-id pub-id-type="pmid">27209446</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B27"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Wang</surname>
<given-names>Q.</given-names>
</name>
, <name name-style="western"><surname>Jia</surname>
<given-names>P.</given-names>
</name>
, <name name-style="western"><surname>Zhao</surname>
<given-names>Z.</given-names>
</name>
</person-group>
 (<year>2013</year>
) ‘<article-title>VirusFinder: Software for Efficient and Accurate Detection of Viruses and Their Integration Sites in Host Genomes Through Next Generation Sequencing Data</article-title>
’, <source>PLoS One</source>
, <volume>8</volume>
: <fpage>e64465</fpage>
.<pub-id pub-id-type="pmid">23717618</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B28"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Wood</surname>
<given-names>D. E.</given-names>
</name>
, <name name-style="western"><surname>Salzberg</surname>
<given-names>S. L.</given-names>
</name>
</person-group>
 (<year>2014</year>
) ‘<article-title>Kraken: Ultrafast Metagenomic Sequence Classification Using Exact Alignments</article-title>
’, <source>Genome Biology</source>
, <volume>15</volume>
: <fpage>R46</fpage>
.<pub-id pub-id-type="pmid">24580807</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B29"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Wu</surname>
<given-names>G. A.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2009</year>
) <article-title>‘Whole-Proteome Phylogeny of Large dsDNA Virus Families by an Alignment-Free Method’</article-title>
, <source>Proceedings of the National Academy of Sciences</source>
, <volume>106</volume>
: <fpage>12826</fpage>
–<lpage>31</lpage>
. </mixed-citation>
</ref>
<ref id="vez033-B30"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Youden</surname>
<given-names>W. J.</given-names>
</name>
</person-group>
 (<year>1950</year>
) ‘<article-title>Index for Rating Diagnostic Tests</article-title>
’, <source>Cancer</source>
, <volume>3</volume>
: <fpage>32</fpage>
–<lpage>5</lpage>
.<pub-id pub-id-type="pmid">15405679</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B31"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Zhang</surname>
<given-names>Q.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2014</year>
) ‘<article-title>These Are Not the k-Mers You Are Looking for: Efficient Online k-Mer Counting Using a Probabilistic Data Structure</article-title>
’, <source>PLoS One</source>
, <volume>9</volume>
: <fpage>e101271</fpage>
.<pub-id pub-id-type="pmid">25062443</pub-id>
</mixed-citation>
</ref>
<ref id="vez033-B32"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Zheng</surname>
<given-names>Y.</given-names>
</name>
<etal>et al</etal>
</person-group>
 (<year>2017</year>
) ‘<article-title>VirusDetect: An Automated Pipeline for Efficient Virus Discovery Using Deep Sequencing of Small RNAs</article-title>
’, <source>Virology</source>
, <volume>500</volume>
: <fpage>130</fpage>
–<lpage>8</lpage>
.<pub-id pub-id-type="pmid">27825033</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001191  | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 001191  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri