Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 0011229 ( Pmc/Corpus ); précédent : 0011228; suivant : 0011230 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Fine-scale differentiation between
<italic>Bacillus anthracis</italic>
and
<italic>Bacillus cereus</italic>
group signatures in metagenome shotgun data</title>
<author>
<name sortKey="Petit Iii, Robert A" sort="Petit Iii, Robert A" uniqKey="Petit Iii R" first="Robert A." last="Petit Iii">Robert A. Petit Iii</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine</institution>
,
<city>Atlanta</city>
,
<state>GA</state>
,
<country>United States of America</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hogan, James M" sort="Hogan, James M" uniqKey="Hogan J" first="James M." last="Hogan">James M. Hogan</name>
<affiliation>
<nlm:aff id="aff-2">
<institution>Queensland University of Technology</institution>
,
<city>Brisbane</city>
,
<country>Australia</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ezewudo, Matthew N" sort="Ezewudo, Matthew N" uniqKey="Ezewudo M" first="Matthew N." last="Ezewudo">Matthew N. Ezewudo</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine</institution>
,
<city>Atlanta</city>
,
<state>GA</state>
,
<country>United States of America</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Joseph, Sandeep J" sort="Joseph, Sandeep J" uniqKey="Joseph S" first="Sandeep J." last="Joseph">Sandeep J. Joseph</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine</institution>
,
<city>Atlanta</city>
,
<state>GA</state>
,
<country>United States of America</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Read, Timothy D" sort="Read, Timothy D" uniqKey="Read T" first="Timothy D." last="Read">Timothy D. Read</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine</institution>
,
<city>Atlanta</city>
,
<state>GA</state>
,
<country>United States of America</country>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">30155371</idno>
<idno type="pmc">6109372</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6109372</idno>
<idno type="RBID">PMC:6109372</idno>
<idno type="doi">10.7717/peerj.5515</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">001122</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001122</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Fine-scale differentiation between
<italic>Bacillus anthracis</italic>
and
<italic>Bacillus cereus</italic>
group signatures in metagenome shotgun data</title>
<author>
<name sortKey="Petit Iii, Robert A" sort="Petit Iii, Robert A" uniqKey="Petit Iii R" first="Robert A." last="Petit Iii">Robert A. Petit Iii</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine</institution>
,
<city>Atlanta</city>
,
<state>GA</state>
,
<country>United States of America</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hogan, James M" sort="Hogan, James M" uniqKey="Hogan J" first="James M." last="Hogan">James M. Hogan</name>
<affiliation>
<nlm:aff id="aff-2">
<institution>Queensland University of Technology</institution>
,
<city>Brisbane</city>
,
<country>Australia</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ezewudo, Matthew N" sort="Ezewudo, Matthew N" uniqKey="Ezewudo M" first="Matthew N." last="Ezewudo">Matthew N. Ezewudo</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine</institution>
,
<city>Atlanta</city>
,
<state>GA</state>
,
<country>United States of America</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Joseph, Sandeep J" sort="Joseph, Sandeep J" uniqKey="Joseph S" first="Sandeep J." last="Joseph">Sandeep J. Joseph</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine</institution>
,
<city>Atlanta</city>
,
<state>GA</state>
,
<country>United States of America</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Read, Timothy D" sort="Read, Timothy D" uniqKey="Read T" first="Timothy D." last="Read">Timothy D. Read</name>
<affiliation>
<nlm:aff id="aff-1">
<institution>Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine</institution>
,
<city>Atlanta</city>
,
<state>GA</state>
,
<country>United States of America</country>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PeerJ</title>
<idno type="eISSN">2167-8359</idno>
<imprint>
<date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>It is possible to detect bacterial species in shotgun metagenome datasets through the presence of only a few sequence reads. However, false positive results can arise, as was the case in the initial findings of a recent New York City subway metagenome project. False positives are especially likely when two closely related are present in the same sample.
<italic>Bacillus anthracis</italic>
, the etiologic agent of anthrax, is a high-consequence pathogen that shares >99% average nucleotide identity with
<italic>Bacillus cereus</italic>
group (BCerG) genomes. Our goal was to create an analysis tool that used k-mers to detect
<italic>B. anthracis,</italic>
incorporating information about the coverage of BCerG in the metagenome sample.</p>
</sec>
<sec>
<title>Methods</title>
<p>Using public complete genome sequence datasets, we identified a set of 31-mer signatures that differentiated
<italic>B. anthracis</italic>
from other members of the
<italic>B. cereus</italic>
group (BCerG), and another set which differentiated BCerG genomes (including
<italic>B. anthracis</italic>
) from other
<italic>Bacillus</italic>
strains. We also created a set of 31-mers for detecting the lethal factor gene, the key genetic diagnostic of the presence of anthrax-causing bacteria. We created synthetic sequence datasets based on existing genomes to test the accuracy of a k-mer based detection model.</p>
</sec>
<sec>
<title>Results</title>
<p>We found 239,503
<italic>B. anthracis</italic>
-specific 31-mers (the
<italic>Ba31 set</italic>
), 10,183 BCerG 31-mers (the
<italic>BCerG31 set</italic>
), and 2,617 lethal factor k-mers (the
<italic>lef31</italic>
set). We showed that false positive
<italic>B. anthracis</italic>
k-mers—which arise from random sequencing errors—are observable at high genome coverages of
<italic>B. cereus</italic>
. We also showed that there is a “gray zone” below 0.184× coverage of the
<italic>B. anthracis</italic>
genome sequence, in which we cannot expect with high probability to identify lethal factor k-mers. We created a linear regression model to differentiate the presence of
<italic>B. anthracis</italic>
-like chromosomes from sequencing errors given the BCerG background coverage. We showed that while shotgun datasets from the New York City subway metagenome project had no matches to
<italic>lef31</italic>
k-mers and hence were negative for
<italic>B. anthracis</italic>
, some samples showed evidence of strains very closely related to the pathogen.</p>
</sec>
<sec>
<title>Discussion</title>
<p>This work shows how extensive libraries of complete genomes can be used to create organism-specific signatures to help interpret metagenomes. We contrast “specialist” approaches to metagenome analysis such as this work to “generalist” software that seeks to classify all organisms present in the sample and note the more general utility of a k-mer filter approach when taxonomic boundaries lack clarity or high levels of precision are required.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Ackelsberg, J" uniqKey="Ackelsberg J">J Ackelsberg</name>
</author>
<author>
<name sortKey="Rakeman, J" uniqKey="Rakeman J">J Rakeman</name>
</author>
<author>
<name sortKey="Hughes, S" uniqKey="Hughes S">S Hughes</name>
</author>
<author>
<name sortKey="Petersen, J" uniqKey="Petersen J">J Petersen</name>
</author>
<author>
<name sortKey="Mead, P" uniqKey="Mead P">P Mead</name>
</author>
<author>
<name sortKey="Schriefer, M" uniqKey="Schriefer M">M Schriefer</name>
</author>
<author>
<name sortKey="Kingry, L" uniqKey="Kingry L">L Kingry</name>
</author>
<author>
<name sortKey="Hoffmaster, A" uniqKey="Hoffmaster A">A Hoffmaster</name>
</author>
<author>
<name sortKey="Gee, Je" uniqKey="Gee J">JE Gee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Afshinnekoo, E" uniqKey="Afshinnekoo E">E Afshinnekoo</name>
</author>
<author>
<name sortKey="Meydan, C" uniqKey="Meydan C">C Meydan</name>
</author>
<author>
<name sortKey="Chowdhury, S" uniqKey="Chowdhury S">S Chowdhury</name>
</author>
<author>
<name sortKey="Jaroudi, D" uniqKey="Jaroudi D">D Jaroudi</name>
</author>
<author>
<name sortKey="Boyer, C" uniqKey="Boyer C">C Boyer</name>
</author>
<author>
<name sortKey="Bernstein, N" uniqKey="Bernstein N">N Bernstein</name>
</author>
<author>
<name sortKey="Maritz, Jm" uniqKey="Maritz J">JM Maritz</name>
</author>
<author>
<name sortKey="Reeves, D" uniqKey="Reeves D">D Reeves</name>
</author>
<author>
<name sortKey="Gandara, J" uniqKey="Gandara J">J Gandara</name>
</author>
<author>
<name sortKey="Chhangawala, S" uniqKey="Chhangawala S">S Chhangawala</name>
</author>
<author>
<name sortKey="Ahsanuddin, S" uniqKey="Ahsanuddin S">S Ahsanuddin</name>
</author>
<author>
<name sortKey="Simmons, A" uniqKey="Simmons A">A Simmons</name>
</author>
<author>
<name sortKey="Nessel, T" uniqKey="Nessel T">T Nessel</name>
</author>
<author>
<name sortKey="Sundaresh, B" uniqKey="Sundaresh B">B Sundaresh</name>
</author>
<author>
<name sortKey="Pereira, E" uniqKey="Pereira E">E Pereira</name>
</author>
<author>
<name sortKey="Jorgensen, E" uniqKey="Jorgensen E">E Jorgensen</name>
</author>
<author>
<name sortKey="Kolokotronis, S O" uniqKey="Kolokotronis S">S-O Kolokotronis</name>
</author>
<author>
<name sortKey="Kirchberger, N" uniqKey="Kirchberger N">N Kirchberger</name>
</author>
<author>
<name sortKey="Garcia, I" uniqKey="Garcia I">I Garcia</name>
</author>
<author>
<name sortKey="Gandara, D" uniqKey="Gandara D">D Gandara</name>
</author>
<author>
<name sortKey="Dhanraj, S" uniqKey="Dhanraj S">S Dhanraj</name>
</author>
<author>
<name sortKey="Nawrin, T" uniqKey="Nawrin T">T Nawrin</name>
</author>
<author>
<name sortKey="Saletore, Y" uniqKey="Saletore Y">Y Saletore</name>
</author>
<author>
<name sortKey="Alexander, N" uniqKey="Alexander N">N Alexander</name>
</author>
<author>
<name sortKey="Vijay, P" uniqKey="Vijay P">P Vijay</name>
</author>
<author>
<name sortKey="Henaff, Em" uniqKey="Henaff E">EM Hénaff</name>
</author>
<author>
<name sortKey="Zumbo, P" uniqKey="Zumbo P">P Zumbo</name>
</author>
<author>
<name sortKey="Walsh, M" uniqKey="Walsh M">M Walsh</name>
</author>
<author>
<name sortKey="O Ullan, Gd" uniqKey="O Ullan G">GD O’Mullan</name>
</author>
<author>
<name sortKey="Tighe, S" uniqKey="Tighe S">S Tighe</name>
</author>
<author>
<name sortKey="Dudley, Jt" uniqKey="Dudley J">JT Dudley</name>
</author>
<author>
<name sortKey="Dunaif, A" uniqKey="Dunaif A">A Dunaif</name>
</author>
<author>
<name sortKey="Ennis, S" uniqKey="Ennis S">S Ennis</name>
</author>
<author>
<name sortKey="O Alloran, E" uniqKey="O Alloran E">E O’Halloran</name>
</author>
<author>
<name sortKey="Magalhaes, Tr" uniqKey="Magalhaes T">TR Magalhaes</name>
</author>
<author>
<name sortKey="Boone, B" uniqKey="Boone B">B Boone</name>
</author>
<author>
<name sortKey="Jones, Al" uniqKey="Jones A">AL Jones</name>
</author>
<author>
<name sortKey="Muth, Tr" uniqKey="Muth T">TR Muth</name>
</author>
<author>
<name sortKey="Paolantonio, Ks" uniqKey="Paolantonio K">KS Paolantonio</name>
</author>
<author>
<name sortKey="Alter, E" uniqKey="Alter E">E Alter</name>
</author>
<author>
<name sortKey="Schadt, Ee" uniqKey="Schadt E">EE Schadt</name>
</author>
<author>
<name sortKey="Garbarino, J" uniqKey="Garbarino J">J Garbarino</name>
</author>
<author>
<name sortKey="Prill, Rj" uniqKey="Prill R">RJ Prill</name>
</author>
<author>
<name sortKey="Carlton, Jm" uniqKey="Carlton J">JM Carlton</name>
</author>
<author>
<name sortKey="Levy, S" uniqKey="Levy S">S Levy</name>
</author>
<author>
<name sortKey="Mason, Ce" uniqKey="Mason C">CE Mason</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Afshinnekoo, E" uniqKey="Afshinnekoo E">E Afshinnekoo</name>
</author>
<author>
<name sortKey="Meydan, C" uniqKey="Meydan C">C Meydan</name>
</author>
<author>
<name sortKey="Chowdhury, S" uniqKey="Chowdhury S">S Chowdhury</name>
</author>
<author>
<name sortKey="Jaroudi, D" uniqKey="Jaroudi D">D Jaroudi</name>
</author>
<author>
<name sortKey="Boyer, C" uniqKey="Boyer C">C Boyer</name>
</author>
<author>
<name sortKey="Bernstein, N" uniqKey="Bernstein N">N Bernstein</name>
</author>
<author>
<name sortKey="Maritz, Jm" uniqKey="Maritz J">JM Maritz</name>
</author>
<author>
<name sortKey="Reeves, D" uniqKey="Reeves D">D Reeves</name>
</author>
<author>
<name sortKey="Gandara, J" uniqKey="Gandara J">J Gandara</name>
</author>
<author>
<name sortKey="Chhangawala, S" uniqKey="Chhangawala S">S Chhangawala</name>
</author>
<author>
<name sortKey="Ahsanuddin, S" uniqKey="Ahsanuddin S">S Ahsanuddin</name>
</author>
<author>
<name sortKey="Simmons, A" uniqKey="Simmons A">A Simmons</name>
</author>
<author>
<name sortKey="Nessel, T" uniqKey="Nessel T">T Nessel</name>
</author>
<author>
<name sortKey="Sundaresh, B" uniqKey="Sundaresh B">B Sundaresh</name>
</author>
<author>
<name sortKey="Pereira, E" uniqKey="Pereira E">E Pereira</name>
</author>
<author>
<name sortKey="Jorgensen, E" uniqKey="Jorgensen E">E Jorgensen</name>
</author>
<author>
<name sortKey="Kolokotronis, S O" uniqKey="Kolokotronis S">S-O Kolokotronis</name>
</author>
<author>
<name sortKey="Kirchberger, N" uniqKey="Kirchberger N">N Kirchberger</name>
</author>
<author>
<name sortKey="Garcia, I" uniqKey="Garcia I">I Garcia</name>
</author>
<author>
<name sortKey="Gandara, D" uniqKey="Gandara D">D Gandara</name>
</author>
<author>
<name sortKey="Dhanraj, S" uniqKey="Dhanraj S">S Dhanraj</name>
</author>
<author>
<name sortKey="Nawrin, T" uniqKey="Nawrin T">T Nawrin</name>
</author>
<author>
<name sortKey="Saletore, Y" uniqKey="Saletore Y">Y Saletore</name>
</author>
<author>
<name sortKey="Alexander, N" uniqKey="Alexander N">N Alexander</name>
</author>
<author>
<name sortKey="Vijay, P" uniqKey="Vijay P">P Vijay</name>
</author>
<author>
<name sortKey="Henaff, Em" uniqKey="Henaff E">EM Hénaff</name>
</author>
<author>
<name sortKey="Zumbo, P" uniqKey="Zumbo P">P Zumbo</name>
</author>
<author>
<name sortKey="Walsh, M" uniqKey="Walsh M">M Walsh</name>
</author>
<author>
<name sortKey="O Ullan, Gd" uniqKey="O Ullan G">GD O’Mullan</name>
</author>
<author>
<name sortKey="Tighe, S" uniqKey="Tighe S">S Tighe</name>
</author>
<author>
<name sortKey="Dudley, Jt" uniqKey="Dudley J">JT Dudley</name>
</author>
<author>
<name sortKey="Dunaif, A" uniqKey="Dunaif A">A Dunaif</name>
</author>
<author>
<name sortKey="Ennis, S" uniqKey="Ennis S">S Ennis</name>
</author>
<author>
<name sortKey="O Alloran, E" uniqKey="O Alloran E">E O’Halloran</name>
</author>
<author>
<name sortKey="Magalhaes, Tr" uniqKey="Magalhaes T">TR Magalhaes</name>
</author>
<author>
<name sortKey="Boone, B" uniqKey="Boone B">B Boone</name>
</author>
<author>
<name sortKey="Jones, Al" uniqKey="Jones A">AL Jones</name>
</author>
<author>
<name sortKey="Muth, Tr" uniqKey="Muth T">TR Muth</name>
</author>
<author>
<name sortKey="Paolantonio, Ks" uniqKey="Paolantonio K">KS Paolantonio</name>
</author>
<author>
<name sortKey="Alter, E" uniqKey="Alter E">E Alter</name>
</author>
<author>
<name sortKey="Schadt, Ee" uniqKey="Schadt E">EE Schadt</name>
</author>
<author>
<name sortKey="Garbarino, J" uniqKey="Garbarino J">J Garbarino</name>
</author>
<author>
<name sortKey="Prill, Rj" uniqKey="Prill R">RJ Prill</name>
</author>
<author>
<name sortKey="Carlton, Jm" uniqKey="Carlton J">JM Carlton</name>
</author>
<author>
<name sortKey="Levy, S" uniqKey="Levy S">S Levy</name>
</author>
<author>
<name sortKey="Mason, Ce" uniqKey="Mason C">CE Mason</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bragg, Ts" uniqKey="Bragg T">TS Bragg</name>
</author>
<author>
<name sortKey="Robertson, Dl" uniqKey="Robertson D">DL Robertson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Breitwieser, Fp" uniqKey="Breitwieser F">FP Breitwieser</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brown, Ct" uniqKey="Brown C">CT Brown</name>
</author>
<author>
<name sortKey="Olm, Mr" uniqKey="Olm M">MR Olm</name>
</author>
<author>
<name sortKey="Thomas, Bc" uniqKey="Thomas B">BC Thomas</name>
</author>
<author>
<name sortKey="Banfield, Jf" uniqKey="Banfield J">JF Banfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cachat, E" uniqKey="Cachat E">E Cachat</name>
</author>
<author>
<name sortKey="Barker, M" uniqKey="Barker M">M Barker</name>
</author>
<author>
<name sortKey="Read, Td" uniqKey="Read T">TD Read</name>
</author>
<author>
<name sortKey="Priest, Fg" uniqKey="Priest F">FG Priest</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Camacho, C" uniqKey="Camacho C">C Camacho</name>
</author>
<author>
<name sortKey="Coulouris, G" uniqKey="Coulouris G">G Coulouris</name>
</author>
<author>
<name sortKey="Avagyan, V" uniqKey="Avagyan V">V Avagyan</name>
</author>
<author>
<name sortKey="Ma, N" uniqKey="Ma N">N Ma</name>
</author>
<author>
<name sortKey="Papadopoulos, J" uniqKey="Papadopoulos J">J Papadopoulos</name>
</author>
<author>
<name sortKey="Bealer, K" uniqKey="Bealer K">K Bealer</name>
</author>
<author>
<name sortKey="Madden, Tl" uniqKey="Madden T">TL Madden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Carlson, Cj" uniqKey="Carlson C">CJ Carlson</name>
</author>
<author>
<name sortKey="Getz, Wm" uniqKey="Getz W">WM Getz</name>
</author>
<author>
<name sortKey="Kausrud, Kl" uniqKey="Kausrud K">KL Kausrud</name>
</author>
<author>
<name sortKey="Cizauskas, Ca" uniqKey="Cizauskas C">CA Cizauskas</name>
</author>
<author>
<name sortKey="Blackburn, Jk" uniqKey="Blackburn J">JK Blackburn</name>
</author>
<author>
<name sortKey="Bustos Carrillo, Fa" uniqKey="Bustos Carrillo F">FA Bustos Carrillo</name>
</author>
<author>
<name sortKey="Colwell, R" uniqKey="Colwell R">R Colwell</name>
</author>
<author>
<name sortKey="Easterday, Wr" uniqKey="Easterday W">WR Easterday</name>
</author>
<author>
<name sortKey="Ganz, Hh" uniqKey="Ganz H">HH Ganz</name>
</author>
<author>
<name sortKey="Kamath, Pl" uniqKey="Kamath P">PL Kamath</name>
</author>
<author>
<name sortKey=" Kstad, Oa" uniqKey=" Kstad O">OA Økstad</name>
</author>
<author>
<name sortKey="Turner, Wc" uniqKey="Turner W">WC Turner</name>
</author>
<author>
<name sortKey="Kolst, A B" uniqKey="Kolst A">A-B Kolstø</name>
</author>
<author>
<name sortKey="Stenseth, Nc" uniqKey="Stenseth N">NC Stenseth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dixon, Tc" uniqKey="Dixon T">TC Dixon</name>
</author>
<author>
<name sortKey="Meselson, M" uniqKey="Meselson M">M Meselson</name>
</author>
<author>
<name sortKey="Guillemin, J" uniqKey="Guillemin J">J Guillemin</name>
</author>
<author>
<name sortKey="Hanna, Pc" uniqKey="Hanna P">PC Hanna</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gordon, A" uniqKey="Gordon A">A Gordon</name>
</author>
<author>
<name sortKey="Hannon, Gj" uniqKey="Hannon G">GJ Hannon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Helgason, E" uniqKey="Helgason E">E Helgason</name>
</author>
<author>
<name sortKey="Okstad, Oa" uniqKey="Okstad O">OA Okstad</name>
</author>
<author>
<name sortKey="Caugant, Da" uniqKey="Caugant D">DA Caugant</name>
</author>
<author>
<name sortKey="Johansen, Ha" uniqKey="Johansen H">HA Johansen</name>
</author>
<author>
<name sortKey="Fouet, A" uniqKey="Fouet A">A Fouet</name>
</author>
<author>
<name sortKey="Mock, M" uniqKey="Mock M">M Mock</name>
</author>
<author>
<name sortKey="Hegna, I" uniqKey="Hegna I">I Hegna</name>
</author>
<author>
<name sortKey="Kolst, Ab" uniqKey="Kolst A">AB Kolstø</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoffmann, C" uniqKey="Hoffmann C">C Hoffmann</name>
</author>
<author>
<name sortKey="Zimmermann, F" uniqKey="Zimmermann F">F Zimmermann</name>
</author>
<author>
<name sortKey="Biek, R" uniqKey="Biek R">R Biek</name>
</author>
<author>
<name sortKey="Kuehl, H" uniqKey="Kuehl H">H Kuehl</name>
</author>
<author>
<name sortKey="Nowak, K" uniqKey="Nowak K">K Nowak</name>
</author>
<author>
<name sortKey="Mundry, R" uniqKey="Mundry R">R Mundry</name>
</author>
<author>
<name sortKey="Agbor, A" uniqKey="Agbor A">A Agbor</name>
</author>
<author>
<name sortKey="Angedakin, S" uniqKey="Angedakin S">S Angedakin</name>
</author>
<author>
<name sortKey="Arandjelovic, M" uniqKey="Arandjelovic M">M Arandjelovic</name>
</author>
<author>
<name sortKey="Blankenburg, A" uniqKey="Blankenburg A">A Blankenburg</name>
</author>
<author>
<name sortKey="Brazolla, G" uniqKey="Brazolla G">G Brazolla</name>
</author>
<author>
<name sortKey="Corogenes, K" uniqKey="Corogenes K">K Corogenes</name>
</author>
<author>
<name sortKey="Couacy Hymann, E" uniqKey="Couacy Hymann E">E Couacy-Hymann</name>
</author>
<author>
<name sortKey="Deschner, T" uniqKey="Deschner T">T Deschner</name>
</author>
<author>
<name sortKey="Dieguez, P" uniqKey="Dieguez P">P Dieguez</name>
</author>
<author>
<name sortKey="Dierks, K" uniqKey="Dierks K">K Dierks</name>
</author>
<author>
<name sortKey="Dux, A" uniqKey="Dux A">A Düx</name>
</author>
<author>
<name sortKey="Dupke, S" uniqKey="Dupke S">S Dupke</name>
</author>
<author>
<name sortKey="Eshuis, H" uniqKey="Eshuis H">H Eshuis</name>
</author>
<author>
<name sortKey="Formenty, P" uniqKey="Formenty P">P Formenty</name>
</author>
<author>
<name sortKey="Yuh, Yg" uniqKey="Yuh Y">YG Yuh</name>
</author>
<author>
<name sortKey="Goedmakers, A" uniqKey="Goedmakers A">A Goedmakers</name>
</author>
<author>
<name sortKey="Gogarten, Jf" uniqKey="Gogarten J">JF Gogarten</name>
</author>
<author>
<name sortKey="Granjon, A C" uniqKey="Granjon A">A-C Granjon</name>
</author>
<author>
<name sortKey="Mcgraw, S" uniqKey="Mcgraw S">S McGraw</name>
</author>
<author>
<name sortKey="Grunow, R" uniqKey="Grunow R">R Grunow</name>
</author>
<author>
<name sortKey="Hart, J" uniqKey="Hart J">J Hart</name>
</author>
<author>
<name sortKey="Jones, S" uniqKey="Jones S">S Jones</name>
</author>
<author>
<name sortKey="Junker, J" uniqKey="Junker J">J Junker</name>
</author>
<author>
<name sortKey="Kiang, J" uniqKey="Kiang J">J Kiang</name>
</author>
<author>
<name sortKey="Langergraber, K" uniqKey="Langergraber K">K Langergraber</name>
</author>
<author>
<name sortKey="Lapuente, J" uniqKey="Lapuente J">J Lapuente</name>
</author>
<author>
<name sortKey="Lee, K" uniqKey="Lee K">K Lee</name>
</author>
<author>
<name sortKey="Leendertz, Sa" uniqKey="Leendertz S">SA Leendertz</name>
</author>
<author>
<name sortKey="Leguillon, F" uniqKey="Leguillon F">F Léguillon</name>
</author>
<author>
<name sortKey="Leinert, V" uniqKey="Leinert V">V Leinert</name>
</author>
<author>
<name sortKey="Lohrich, T" uniqKey="Lohrich T">T Löhrich</name>
</author>
<author>
<name sortKey="Marrocoli, S" uniqKey="Marrocoli S">S Marrocoli</name>
</author>
<author>
<name sortKey="M Tz Rensing, K" uniqKey="M Tz Rensing K">K Mätz-Rensing</name>
</author>
<author>
<name sortKey="Meier, A" uniqKey="Meier A">A Meier</name>
</author>
<author>
<name sortKey="Merkel, K" uniqKey="Merkel K">K Merkel</name>
</author>
<author>
<name sortKey="Metzger, S" uniqKey="Metzger S">S Metzger</name>
</author>
<author>
<name sortKey="Murai, M" uniqKey="Murai M">M Murai</name>
</author>
<author>
<name sortKey="Niedorf, S" uniqKey="Niedorf S">S Niedorf</name>
</author>
<author>
<name sortKey="De Nys, H" uniqKey="De Nys H">H De Nys</name>
</author>
<author>
<name sortKey="Sachse, A" uniqKey="Sachse A">A Sachse</name>
</author>
<author>
<name sortKey="Van Schijndel, J" uniqKey="Van Schijndel J">J Van Schijndel</name>
</author>
<author>
<name sortKey="Thiesen, U" uniqKey="Thiesen U">U Thiesen</name>
</author>
<author>
<name sortKey="Ton, E" uniqKey="Ton E">E Ton</name>
</author>
<author>
<name sortKey="Wu, D" uniqKey="Wu D">D Wu</name>
</author>
<author>
<name sortKey="Wieler, Lh" uniqKey="Wieler L">LH Wieler</name>
</author>
<author>
<name sortKey="Boesch, C" uniqKey="Boesch C">C Boesch</name>
</author>
<author>
<name sortKey="Klee, Sr" uniqKey="Klee S">SR Klee</name>
</author>
<author>
<name sortKey="Wittig, Rm" uniqKey="Wittig R">RM Wittig</name>
</author>
<author>
<name sortKey="Calvignac Spencer, S" uniqKey="Calvignac Spencer S">S Calvignac-Spencer</name>
</author>
<author>
<name sortKey="Leendertz, Fh" uniqKey="Leendertz F">FH Leendertz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoffmaster, Ar" uniqKey="Hoffmaster A">AR Hoffmaster</name>
</author>
<author>
<name sortKey="Ravel, J" uniqKey="Ravel J">J Ravel</name>
</author>
<author>
<name sortKey="Rasko, Da" uniqKey="Rasko D">DA Rasko</name>
</author>
<author>
<name sortKey="Chapman, Gd" uniqKey="Chapman G">GD Chapman</name>
</author>
<author>
<name sortKey="Chute, Md" uniqKey="Chute M">MD Chute</name>
</author>
<author>
<name sortKey="Marston, Ck" uniqKey="Marston C">CK Marston</name>
</author>
<author>
<name sortKey="De, Bk" uniqKey="De B">BK De</name>
</author>
<author>
<name sortKey="Sacchi, Ct" uniqKey="Sacchi C">CT Sacchi</name>
</author>
<author>
<name sortKey="Fitzgerald, C" uniqKey="Fitzgerald C">C Fitzgerald</name>
</author>
<author>
<name sortKey="Mayer, Lw" uniqKey="Mayer L">LW Mayer</name>
</author>
<author>
<name sortKey="Maiden, Mcj" uniqKey="Maiden M">MCJ Maiden</name>
</author>
<author>
<name sortKey="Priest, Fg" uniqKey="Priest F">FG Priest</name>
</author>
<author>
<name sortKey="Barker, M" uniqKey="Barker M">M Barker</name>
</author>
<author>
<name sortKey="Jiang, L" uniqKey="Jiang L">L Jiang</name>
</author>
<author>
<name sortKey="Cer, Rz" uniqKey="Cer R">RZ Cer</name>
</author>
<author>
<name sortKey="Rilstone, J" uniqKey="Rilstone J">J Rilstone</name>
</author>
<author>
<name sortKey="Peterson, Sn" uniqKey="Peterson S">SN Peterson</name>
</author>
<author>
<name sortKey="Weyant, Rs" uniqKey="Weyant R">RS Weyant</name>
</author>
<author>
<name sortKey="Galloway, Dr" uniqKey="Galloway D">DR Galloway</name>
</author>
<author>
<name sortKey="Read, Td" uniqKey="Read T">TD Read</name>
</author>
<author>
<name sortKey="Popovic, T" uniqKey="Popovic T">T Popovic</name>
</author>
<author>
<name sortKey="Fraser Liggett, Cm" uniqKey="Fraser Liggett C">CM Fraser-Liggett</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, W" uniqKey="Huang W">W Huang</name>
</author>
<author>
<name sortKey="Li, L" uniqKey="Li L">L Li</name>
</author>
<author>
<name sortKey="Myers, Jr" uniqKey="Myers J">JR Myers</name>
</author>
<author>
<name sortKey="Marth, Gt" uniqKey="Marth G">GT Marth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Keim, Ps" uniqKey="Keim P">PS Keim</name>
</author>
<author>
<name sortKey="Wagner, Dm" uniqKey="Wagner D">DM Wagner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Klee, Sr" uniqKey="Klee S">SR Klee</name>
</author>
<author>
<name sortKey="Brzuszkiewicz, Eb" uniqKey="Brzuszkiewicz E">EB Brzuszkiewicz</name>
</author>
<author>
<name sortKey="Nattermann, H" uniqKey="Nattermann H">H Nattermann</name>
</author>
<author>
<name sortKey="Bruggemann, H" uniqKey="Bruggemann H">H Brüggemann</name>
</author>
<author>
<name sortKey="Dupke, S" uniqKey="Dupke S">S Dupke</name>
</author>
<author>
<name sortKey="Wollherr, A" uniqKey="Wollherr A">A Wollherr</name>
</author>
<author>
<name sortKey="Franz, T" uniqKey="Franz T">T Franz</name>
</author>
<author>
<name sortKey="Pauli, G" uniqKey="Pauli G">G Pauli</name>
</author>
<author>
<name sortKey="Appel, B" uniqKey="Appel B">B Appel</name>
</author>
<author>
<name sortKey="Liebl, W" uniqKey="Liebl W">W Liebl</name>
</author>
<author>
<name sortKey="Couacy Hymann, E" uniqKey="Couacy Hymann E">E Couacy-Hymann</name>
</author>
<author>
<name sortKey="Boesch, C" uniqKey="Boesch C">C Boesch</name>
</author>
<author>
<name sortKey="Meyer, F D" uniqKey="Meyer F">F-D Meyer</name>
</author>
<author>
<name sortKey="Leendertz, Fh" uniqKey="Leendertz F">FH Leendertz</name>
</author>
<author>
<name sortKey="Ellerbrok, H" uniqKey="Ellerbrok H">H Ellerbrok</name>
</author>
<author>
<name sortKey="Gottschalk, G" uniqKey="Gottschalk G">G Gottschalk</name>
</author>
<author>
<name sortKey="Grunow, R" uniqKey="Grunow R">R Grunow</name>
</author>
<author>
<name sortKey="Liesegang, H" uniqKey="Liesegang H">H Liesegang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Konstantinidis, Kt" uniqKey="Konstantinidis K">KT Konstantinidis</name>
</author>
<author>
<name sortKey="Tiedje, Jm" uniqKey="Tiedje J">JM Tiedje</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koslicki, D" uniqKey="Koslicki D">D Koslicki</name>
</author>
<author>
<name sortKey="Falush, D" uniqKey="Falush D">D Falush</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Handsaker, B" uniqKey="Handsaker B">B Handsaker</name>
</author>
<author>
<name sortKey="Wysoker, A" uniqKey="Wysoker A">A Wysoker</name>
</author>
<author>
<name sortKey="Fennell, T" uniqKey="Fennell T">T Fennell</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Homer, N" uniqKey="Homer N">N Homer</name>
</author>
<author>
<name sortKey="Marth, G" uniqKey="Marth G">G Marth</name>
</author>
<author>
<name sortKey="Abecasis, G" uniqKey="Abecasis G">G Abecasis</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mason, C" uniqKey="Mason C">C Mason</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcintyre, Abr" uniqKey="Mcintyre A">ABR McIntyre</name>
</author>
<author>
<name sortKey="Ounit, R" uniqKey="Ounit R">R Ounit</name>
</author>
<author>
<name sortKey="Afshinnekoo, E" uniqKey="Afshinnekoo E">E Afshinnekoo</name>
</author>
<author>
<name sortKey="Prill, Rj" uniqKey="Prill R">RJ Prill</name>
</author>
<author>
<name sortKey="Henaff, E" uniqKey="Henaff E">E Hénaff</name>
</author>
<author>
<name sortKey="Alexander, N" uniqKey="Alexander N">N Alexander</name>
</author>
<author>
<name sortKey="Minot, Ss" uniqKey="Minot S">SS Minot</name>
</author>
<author>
<name sortKey="Danko, D" uniqKey="Danko D">D Danko</name>
</author>
<author>
<name sortKey="Foox, J" uniqKey="Foox J">J Foox</name>
</author>
<author>
<name sortKey="Ahsanuddin, S" uniqKey="Ahsanuddin S">S Ahsanuddin</name>
</author>
<author>
<name sortKey="Tighe, S" uniqKey="Tighe S">S Tighe</name>
</author>
<author>
<name sortKey="Hasan, Na" uniqKey="Hasan N">NA Hasan</name>
</author>
<author>
<name sortKey="Subramanian, P" uniqKey="Subramanian P">P Subramanian</name>
</author>
<author>
<name sortKey="Moffat, K" uniqKey="Moffat K">K Moffat</name>
</author>
<author>
<name sortKey="Levy, S" uniqKey="Levy S">S Levy</name>
</author>
<author>
<name sortKey="Lonardi, S" uniqKey="Lonardi S">S Lonardi</name>
</author>
<author>
<name sortKey="Greenfield, N" uniqKey="Greenfield N">N Greenfield</name>
</author>
<author>
<name sortKey="Colwell, Rr" uniqKey="Colwell R">RR Colwell</name>
</author>
<author>
<name sortKey="Rosen, Gl" uniqKey="Rosen G">GL Rosen</name>
</author>
<author>
<name sortKey="Mason, Ce" uniqKey="Mason C">CE Mason</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Minot, Ss" uniqKey="Minot S">SS Minot</name>
</author>
<author>
<name sortKey="Greenfield, N" uniqKey="Greenfield N">N Greenfield</name>
</author>
<author>
<name sortKey="Afshinnekoo, E" uniqKey="Afshinnekoo E">E Afshinnekoo</name>
</author>
<author>
<name sortKey="Mason, Ce" uniqKey="Mason C">CE Mason</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nasko, Dj" uniqKey="Nasko D">DJ Nasko</name>
</author>
<author>
<name sortKey="Koren, S" uniqKey="Koren S">S Koren</name>
</author>
<author>
<name sortKey="Phillippy, Am" uniqKey="Phillippy A">AM Phillippy</name>
</author>
<author>
<name sortKey="Treangen, Tj" uniqKey="Treangen T">TJ Treangen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ondov, Bd" uniqKey="Ondov B">BD Ondov</name>
</author>
<author>
<name sortKey="Treangen, Tj" uniqKey="Treangen T">TJ Treangen</name>
</author>
<author>
<name sortKey="Melsted, P" uniqKey="Melsted P">P Melsted</name>
</author>
<author>
<name sortKey="Mallonee, Ab" uniqKey="Mallonee A">AB Mallonee</name>
</author>
<author>
<name sortKey="Bergman, Nh" uniqKey="Bergman N">NH Bergman</name>
</author>
<author>
<name sortKey="Koren, S" uniqKey="Koren S">S Koren</name>
</author>
<author>
<name sortKey="Phillippy, Am" uniqKey="Phillippy A">AM Phillippy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pannucci, J" uniqKey="Pannucci J">J Pannucci</name>
</author>
<author>
<name sortKey="Okinaka, Rt" uniqKey="Okinaka R">RT Okinaka</name>
</author>
<author>
<name sortKey="Williams, E" uniqKey="Williams E">E Williams</name>
</author>
<author>
<name sortKey="Sabin, R" uniqKey="Sabin R">R Sabin</name>
</author>
<author>
<name sortKey="Ticknor, Lo" uniqKey="Ticknor L">LO Ticknor</name>
</author>
<author>
<name sortKey="Kuske, Cr" uniqKey="Kuske C">CR Kuske</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Petit Iii, Ra" uniqKey="Petit Iii R">RA Petit III</name>
</author>
<author>
<name sortKey="Ezewudo, M" uniqKey="Ezewudo M">M Ezewudo</name>
</author>
<author>
<name sortKey="Joseph, Sj" uniqKey="Joseph S">SJ Joseph</name>
</author>
<author>
<name sortKey="Read, Td" uniqKey="Read T">TD Read</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Quast, C" uniqKey="Quast C">C Quast</name>
</author>
<author>
<name sortKey="Pruesse, E" uniqKey="Pruesse E">E Pruesse</name>
</author>
<author>
<name sortKey="Yilmaz, P" uniqKey="Yilmaz P">P Yilmaz</name>
</author>
<author>
<name sortKey="Gerken, J" uniqKey="Gerken J">J Gerken</name>
</author>
<author>
<name sortKey="Schweer, T" uniqKey="Schweer T">T Schweer</name>
</author>
<author>
<name sortKey="Yarza, P" uniqKey="Yarza P">P Yarza</name>
</author>
<author>
<name sortKey="Peplies, J" uniqKey="Peplies J">J Peplies</name>
</author>
<author>
<name sortKey="Glockner, Fo" uniqKey="Glockner F">FO Glöckner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Quinlan, Ar" uniqKey="Quinlan A">AR Quinlan</name>
</author>
<author>
<name sortKey="Hall, Im" uniqKey="Hall I">IM Hall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rasko, Da" uniqKey="Rasko D">DA Rasko</name>
</author>
<author>
<name sortKey="Rosovitz, Mj" uniqKey="Rosovitz M">MJ Rosovitz</name>
</author>
<author>
<name sortKey=" Kstad, Oa" uniqKey=" Kstad O">OA Økstad</name>
</author>
<author>
<name sortKey="Fouts, De" uniqKey="Fouts D">DE Fouts</name>
</author>
<author>
<name sortKey="Jiang, L" uniqKey="Jiang L">L Jiang</name>
</author>
<author>
<name sortKey="Cer, Rz" uniqKey="Cer R">RZ Cer</name>
</author>
<author>
<name sortKey="Kolst, A B" uniqKey="Kolst A">A-B Kolstø</name>
</author>
<author>
<name sortKey="Gill, Sr" uniqKey="Gill S">SR Gill</name>
</author>
<author>
<name sortKey="Ravel, J" uniqKey="Ravel J">J Ravel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Read, Td" uniqKey="Read T">TD Read</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
<author>
<name sortKey="Shumway, M" uniqKey="Shumway M">M Shumway</name>
</author>
<author>
<name sortKey="Umayam, L" uniqKey="Umayam L">L Umayam</name>
</author>
<author>
<name sortKey="Jiang, L" uniqKey="Jiang L">L Jiang</name>
</author>
<author>
<name sortKey="Holtzapple, E" uniqKey="Holtzapple E">E Holtzapple</name>
</author>
<author>
<name sortKey="Busch, Jd" uniqKey="Busch J">JD Busch</name>
</author>
<author>
<name sortKey="Smith, Kl" uniqKey="Smith K">KL Smith</name>
</author>
<author>
<name sortKey="Schupp, Jm" uniqKey="Schupp J">JM Schupp</name>
</author>
<author>
<name sortKey="Solomon, D" uniqKey="Solomon D">D Solomon</name>
</author>
<author>
<name sortKey="Keim, P" uniqKey="Keim P">P Keim</name>
</author>
<author>
<name sortKey="Fraser, Cm" uniqKey="Fraser C">CM Fraser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Segata, N" uniqKey="Segata N">N Segata</name>
</author>
<author>
<name sortKey="Waldron, L" uniqKey="Waldron L">L Waldron</name>
</author>
<author>
<name sortKey="Ballarini, A" uniqKey="Ballarini A">A Ballarini</name>
</author>
<author>
<name sortKey="Narasimhan, V" uniqKey="Narasimhan V">V Narasimhan</name>
</author>
<author>
<name sortKey="Jousson, O" uniqKey="Jousson O">O Jousson</name>
</author>
<author>
<name sortKey="Huttenhower, C" uniqKey="Huttenhower C">C Huttenhower</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zwick, Me" uniqKey="Zwick M">ME Zwick</name>
</author>
<author>
<name sortKey="Joseph, Sj" uniqKey="Joseph S">SJ Joseph</name>
</author>
<author>
<name sortKey="Didelot, X" uniqKey="Didelot X">X Didelot</name>
</author>
<author>
<name sortKey="Chen, Pe" uniqKey="Chen P">PE Chen</name>
</author>
<author>
<name sortKey="Bishop Lilly, Ka" uniqKey="Bishop Lilly K">KA Bishop-Lilly</name>
</author>
<author>
<name sortKey="Stewart, Ac" uniqKey="Stewart A">AC Stewart</name>
</author>
<author>
<name sortKey="Willner, K" uniqKey="Willner K">K Willner</name>
</author>
<author>
<name sortKey="Nolan, N" uniqKey="Nolan N">N Nolan</name>
</author>
<author>
<name sortKey="Lentz, S" uniqKey="Lentz S">S Lentz</name>
</author>
<author>
<name sortKey="Thomason, Mk" uniqKey="Thomason M">MK Thomason</name>
</author>
<author>
<name sortKey="Sozhamannan, S" uniqKey="Sozhamannan S">S Sozhamannan</name>
</author>
<author>
<name sortKey="Mateczun, Aj" uniqKey="Mateczun A">AJ Mateczun</name>
</author>
<author>
<name sortKey="Du, L" uniqKey="Du L">L Du</name>
</author>
<author>
<name sortKey="Read, Td" uniqKey="Read T">TD Read</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PeerJ</journal-id>
<journal-id journal-id-type="iso-abbrev">PeerJ</journal-id>
<journal-id journal-id-type="publisher-id">peerj</journal-id>
<journal-id journal-id-type="pmc">peerj</journal-id>
<journal-title-group>
<journal-title>PeerJ</journal-title>
</journal-title-group>
<issn pub-type="epub">2167-8359</issn>
<publisher>
<publisher-name>PeerJ Inc.</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">30155371</article-id>
<article-id pub-id-type="pmc">6109372</article-id>
<article-id pub-id-type="publisher-id">5515</article-id>
<article-id pub-id-type="doi">10.7717/peerj.5515</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Bioinformatics</subject>
</subj-group>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
</subj-group>
<subj-group subj-group-type="heading">
<subject>Infectious Diseases</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Fine-scale differentiation between
<italic>Bacillus anthracis</italic>
and
<italic>Bacillus cereus</italic>
group signatures in metagenome shotgun data</article-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name>
<surname>Petit III</surname>
<given-names>Robert A.</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-2" contrib-type="author">
<name>
<surname>Hogan</surname>
<given-names>James M.</given-names>
</name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<contrib id="author-3" contrib-type="author">
<name>
<surname>Ezewudo</surname>
<given-names>Matthew N.</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-4" contrib-type="author">
<name>
<surname>Joseph</surname>
<given-names>Sandeep J.</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-5" contrib-type="author" corresp="yes">
<name>
<surname>Read</surname>
<given-names>Timothy D.</given-names>
</name>
<email>tread@emory.edu</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<aff id="aff-1">
<label>1</label>
<institution>Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine</institution>
,
<city>Atlanta</city>
,
<state>GA</state>
,
<country>United States of America</country>
</aff>
<aff id="aff-2">
<label>2</label>
<institution>Queensland University of Technology</institution>
,
<city>Brisbane</city>
,
<country>Australia</country>
</aff>
</contrib-group>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Hoyles</surname>
<given-names>Lesley</given-names>
</name>
</contrib>
</contrib-group>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2018-08-22">
<day>22</day>
<month>8</month>
<year iso-8601-date="2018">2018</year>
</pub-date>
<pub-date pub-type="collection">
<year>2018</year>
</pub-date>
<volume>6</volume>
<elocation-id>e5515</elocation-id>
<history>
<date date-type="received" iso-8601-date="2018-06-07">
<day>7</day>
<month>6</month>
<year iso-8601-date="2018">2018</year>
</date>
<date date-type="accepted" iso-8601-date="2018-08-03">
<day>3</day>
<month>8</month>
<year iso-8601-date="2018">2018</year>
</date>
</history>
<permissions>
<copyright-statement>©2018 Petit III et al.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Petit III et al.</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://peerj.com/articles/5515"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>It is possible to detect bacterial species in shotgun metagenome datasets through the presence of only a few sequence reads. However, false positive results can arise, as was the case in the initial findings of a recent New York City subway metagenome project. False positives are especially likely when two closely related are present in the same sample.
<italic>Bacillus anthracis</italic>
, the etiologic agent of anthrax, is a high-consequence pathogen that shares >99% average nucleotide identity with
<italic>Bacillus cereus</italic>
group (BCerG) genomes. Our goal was to create an analysis tool that used k-mers to detect
<italic>B. anthracis,</italic>
incorporating information about the coverage of BCerG in the metagenome sample.</p>
</sec>
<sec>
<title>Methods</title>
<p>Using public complete genome sequence datasets, we identified a set of 31-mer signatures that differentiated
<italic>B. anthracis</italic>
from other members of the
<italic>B. cereus</italic>
group (BCerG), and another set which differentiated BCerG genomes (including
<italic>B. anthracis</italic>
) from other
<italic>Bacillus</italic>
strains. We also created a set of 31-mers for detecting the lethal factor gene, the key genetic diagnostic of the presence of anthrax-causing bacteria. We created synthetic sequence datasets based on existing genomes to test the accuracy of a k-mer based detection model.</p>
</sec>
<sec>
<title>Results</title>
<p>We found 239,503
<italic>B. anthracis</italic>
-specific 31-mers (the
<italic>Ba31 set</italic>
), 10,183 BCerG 31-mers (the
<italic>BCerG31 set</italic>
), and 2,617 lethal factor k-mers (the
<italic>lef31</italic>
set). We showed that false positive
<italic>B. anthracis</italic>
k-mers—which arise from random sequencing errors—are observable at high genome coverages of
<italic>B. cereus</italic>
. We also showed that there is a “gray zone” below 0.184× coverage of the
<italic>B. anthracis</italic>
genome sequence, in which we cannot expect with high probability to identify lethal factor k-mers. We created a linear regression model to differentiate the presence of
<italic>B. anthracis</italic>
-like chromosomes from sequencing errors given the BCerG background coverage. We showed that while shotgun datasets from the New York City subway metagenome project had no matches to
<italic>lef31</italic>
k-mers and hence were negative for
<italic>B. anthracis</italic>
, some samples showed evidence of strains very closely related to the pathogen.</p>
</sec>
<sec>
<title>Discussion</title>
<p>This work shows how extensive libraries of complete genomes can be used to create organism-specific signatures to help interpret metagenomes. We contrast “specialist” approaches to metagenome analysis such as this work to “generalist” software that seeks to classify all organisms present in the sample and note the more general utility of a k-mer filter approach when taxonomic boundaries lack clarity or high levels of precision are required.</p>
</sec>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Anthrax</kwd>
<kwd>Metagenome</kwd>
<kwd>
<italic>Bacillus cereus</italic>
group</kwd>
<kwd>Typing</kwd>
<kwd>
<italic>Bacillus anthracis</italic>
</kwd>
<kwd>k-mer</kwd>
</kwd-group>
<funding-group>
<award-group id="fund-1">
<funding-source>Emory University School of Medicine and the Seven Bridges NCI Cancer Genomics Cloud Pilot</funding-source>
</award-group>
<award-group id="fund-2">
<funding-source>National Cancer Institute, National Institutes of Health</funding-source>
</award-group>
<award-group id="fund-3">
<funding-source>Department of Health and Human Services</funding-source>
<award-id>HHSN261201400008C</award-id>
</award-group>
<funding-statement>This study has been supported by development funds to Timothy D. Read from the Emory University School of Medicine and the Seven Bridges NCI Cancer Genomics Cloud Pilot, supported by funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN261201400008C. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>There is great interest in the use of shotgun metagenome data to detect pathogens in clinical and environmental samples. A large number of bioinformatic tools have been developed (
<xref rid="ref-24" ref-type="bibr">McIntyre et al., 2017</xref>
) that use different algorithmic approaches to rapidly parse and analyze sequence data files. Over the last 8–10 years, these data have been generated primarily by Illumina sequencing technology. Typically, sequences from metagenomic data files are matched against public reference databases, such as NCBI RefSeq. Consistency of matches across the tree of life is dependent therefore on the database entries being correctly labelled, having similar levels of representation across species, and having species defined in a consistent manner. However, we are beginning to understand how the skewed representation of taxa contained in the database sometimes affects sampling accuracy (
<xref rid="ref-26" ref-type="bibr">Nasko et al., 2018</xref>
). The classification of many bacterial species harks back to distinctions based on morphological, biochemical and virulence characteristics, made prior to the advent of DNA sequencing. Sometimes, unusually close species boundaries can confound metagenomic classifiers and result in false positive matches. In 2015,
<xref rid="ref-2" ref-type="bibr">Afshinnekoo et al. (2015a)</xref>
published initial findings from an extensive study of the New York Subway metagenome, which claimed that they had detected bacteria responsible for anthrax (
<italic>Bacillus anthracis</italic>
) and plague (
<italic>Yersinia pestis</italic>
). While these misidentifications were swiftly corrected (
<xref rid="ref-23" ref-type="bibr">Mason, 2015</xref>
;
<xref rid="ref-3" ref-type="bibr">Afshinnekoo et al., 2015b</xref>
), indistinct or fuzzy boundaries between species may yield many errors of this nature.</p>
<p>
<italic>B. anthracis</italic>
, the pathogen that is the focus of this work, is a Gram-positive bacterium that forms tough endospores allowing it to survive dormant in the environment for years. The 5.2 (Mbp) main chromosome shares an average nucleotide identity (ANI,
<xref rid="ref-18" ref-type="bibr">Konstantinidis & Tiedje, 2005</xref>
) in excess of 99% with other members of the collection of species known as the ‘
<italic>Bacillus cereus</italic>
group’ (
<italic>BCerG</italic>
) (
<xref rid="ref-12" ref-type="bibr">Helgason et al., 2000</xref>
). The most common species in this group are
<italic>B. cereus</italic>
,
<italic>B. thuringiensis</italic>
and
<italic>B. mycoides</italic>
(
<xref rid="ref-12" ref-type="bibr">Helgason et al., 2000</xref>
;
<xref rid="ref-35" ref-type="bibr">Zwick et al., 2012</xref>
). The recommended level of difference between bacterial species is an ANI of 95% (
<xref rid="ref-18" ref-type="bibr">Konstantinidis & Tiedje, 2005</xref>
). While BCerG strains are mostly opportunistic pathogens of invertebrates and are commonly found in soil,
<italic>B. anthracis</italic>
kills mammals (
<xref rid="ref-9" ref-type="bibr">Carlson et al., 2018</xref>
). Spores are generally found at high titers in soils where animals have recently died from anthrax. Phylogeographic analysis has shown that
<italic>B. anthracis</italic>
is probably native to Africa, with only recent transfer of a limited number of lineages to other continents (
<xref rid="ref-16" ref-type="bibr">Keim & Wagner, 2009</xref>
). For these reasons, it would be an unusual outcome to find spores in the New York subway (
<xref rid="ref-1" ref-type="bibr">Ackelsberg et al., 2015</xref>
)</p>
<p>What sets
<italic>B. anthracis</italic>
apart from other BCerG strains is the presence of two plasmids: pXO1 (181 kb), which carries the lethal toxin genes, and pXO2 (94 kb), which includes genes for a protective capsule. Without either of these plasmids,
<italic>B. anthracis</italic>
is considered attenuated in virulence and unable to cause classic anthrax (
<xref rid="ref-10" ref-type="bibr">Dixon et al., 1999</xref>
). Plasmids from other BCerG genomes may be very similar to pXO1 and pXO2 but lack the important virulence genes (
<xref rid="ref-32" ref-type="bibr">Rasko et al., 2007</xref>
). Rarely, BCerG strains carry pXO1 and appear to cause anthrax-like disease (
<xref rid="ref-14" ref-type="bibr">Hoffmaster et al., 2004</xref>
;
<xref rid="ref-13" ref-type="bibr">Hoffmann et al., 2017</xref>
); pXO2-like plasmids are also quite common in BCerG and other
<italic>Bacillus</italic>
species (
<xref rid="ref-28" ref-type="bibr">Pannucci et al., 2002</xref>
;
<xref rid="ref-7" ref-type="bibr">Cachat et al., 2008</xref>
).</p>
<p>Shortly after the release of the NYC subway metagenome paper, we produced a blog post (
<xref rid="ref-29" ref-type="bibr">Petit III et al., 2015</xref>
) that critically re-analyzed these data in the light of what was known about
<italic>B. anthracis</italic>
genomics. This work, and other critiques, led to reassessment of the data and revisions to the original manuscript. In this paper, we incorporate some of the results introduced informally on our blog and extend them to create a k-mer based approach—using recent public
<italic>B. anthracis</italic>
and BCerG data—to analyze in greater detail how to search for traces of
<italic>B. anthracis</italic>
in shotgun metagenome data. While elements of this method are necessarily specific to
<italic>B. anthracis</italic>
and the context of the BCerG group, the general strategy has far broader utility and this work is a model for future “specialist” studies based on k-mer filtering.</p>
</sec>
<sec sec-type="intro">
<title>Methods</title>
<sec>
<title>Metagenome data and reference genome sequences</title>
<p>Shotgun metagenomic data from the “NYC” study
<ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra?term=SRP051511">SRP051511</ext-link>
(
<xref rid="ref-2" ref-type="bibr">Afshinnekoo et al., 2015a</xref>
) were downloaded from the Sequence Read Archive (SRA) with sra-tools (fastq-dump -I $SRA_ACCESSION, v2.8.2,
<ext-link ext-link-type="uri" xlink:href="https://github.com/ncbi/sra-tools">https://github.com/ncbi/sra-tools</ext-link>
). Reference genomes for different taxonomic groups were downloaded from the NCBI Nucleotide database in April 2018 with the following queries:</p>
<disp-quote>
<p>All BCerG genomes = ‘txid86661[Organism:exp] AND ”complete genome”[Title] AND refseq[filter] AND 3000000:7000000[Sequence Length]’</p>
</disp-quote>
<disp-quote>
<p>All non-BCerG
<italic>Bacillus</italic>
genomes = ‘txid1386[Organism:exp] NOT txid86661 [Organism:exp] “complete genome”[Title] AND 3000000:7000000[Sequence Length] AND refseq[filter]’</p>
</disp-quote>
<p>
<italic>B. anthracis</italic>
genomes were included in the BCerG genome query. The lethal factor gene was extracted from completed pXO1 plasmids downloaded with the following query:</p>
<disp-quote>
<p>pXO1 plasmid = ‘pXO1[Title] AND 140000:200000[Sequence Length] ‘</p>
</disp-quote>
<p>The results of these queries, as of April 2018, are available on our git repository.</p>
</sec>
<sec>
<title>Mapping metagenome data to
<italic>B. anthracis</italic>
plasmids and chromosomes</title>
<p>
<italic>B. anthracis</italic>
positive samples and control samples were mapped against reference pXO1 (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/CP009540">CP009540</ext-link>
) and pXO2 (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NC_007323">NC_007323</ext-link>
) plasmids and reference
<italic>B. anthracis</italic>
(
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/CP009541">CP009541</ext-link>
) and
<italic>B. cereus</italic>
(
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NC_003909">NC_003909</ext-link>
) completed genomes with BWA (bwa mem -t $NUM_CPU $REFERENCE $FASTQ_R1 $FASTQ_R2 >$SAM_FILE, v0.7.5a-r405,
<xref rid="ref-20" ref-type="bibr">Li & Durbin, 2009</xref>
). The aligned reads in SAM format were converted to sorted BAM and indexed with SAMtools (samtools view -@ 10 -bS $SAM_FILE —samtools sort -@ 10 - $SAMPLE, v1.1,
<xref rid="ref-21" ref-type="bibr">Li et al., 2009</xref>
). The per base coverage was extracted with genomeCoverageBed from BEDTools (genomeCoverageBed -d -ibam $BAM_FILE —gzip –best - >$COVERAGE, v2.16.2,
<xref rid="ref-31" ref-type="bibr">Quinlan & Hall, 2010</xref>
). Coverage across the plasmids and chromosomes was plotted for multiple sliding windows with a custom Rscript. Mapped reads were extracted and saved in FASTQ with bam2fastq (bam2fastq -o $FASTQ –no-unaligned $BAM_ FILE, v1.1.0, https://gsl.hudsonalpha.org/information/software/bam2fastq) and FASTA format with fastq_to_fasta from FASTX Toolkit (cat $FASTQ_FILE —fastq_to_fasta -Q33 -n —gzip –best - >$FASTA_OUTPUT, v0.0.13.2,
<xref rid="ref-11" ref-type="bibr">Gordon & Hannon, 2010</xref>
). Scripts, runtime parameters, and output are available at this site (
<xref rid="ref-29" ref-type="bibr">Petit III et al., 2015</xref>
):
<ext-link ext-link-type="uri" xlink:href="https://github.com/Read-Lab-Confederation/nyc-subway-anthrax-study">https://github.com/Read-Lab-Confederation/nyc-subway-anthrax-study</ext-link>
.</p>
</sec>
<sec>
<title>Custom 31-mer assay for
<italic>B. anthracis</italic>
and
<italic>Bacillus cereus</italic>
Group</title>
<p>In preliminary analysis we found four BCerG genomes misclassified in the NCBI Taxonomy database as not being part of the BCerG (see the Results section). To create a rational method to assign taxonomy to genomes for this study we used mash (mash sketch -k 31 -s 100000 -p $NUM_ CPU -o $OUTPUT_PREFIX *.fasta, v2.0,
<xref rid="ref-27" ref-type="bibr">Ondov et al., 2016</xref>
) to reclassify mislabeled
<italic>Bacillus</italic>
genomes as
<italic>B. anthracis</italic>
, non-
<italic>anthracis</italic>
BCerG, or non-BCerG. We identified
<italic>B. anthracis</italic>
strain 2002013094 (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP009902">NZ_CP009902</ext-link>
) as the most distant (Mash distance 0.000687)
<italic>B. anthracis</italic>
member from
<italic>B. anthracis</italic>
str. Ames (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NC_003997">NC_ 003997</ext-link>
). We also identified
<italic>B. cytotoxicus</italic>
NVH 391-98 (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NC_009674">NC_009674</ext-link>
) as the most distant (Mash distance 0.135333) BCerG member from
<italic>B. anthracis</italic>
str. Ames (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NC_003997">NC_003997</ext-link>
). We then determined the Mash distance of all
<italic>Bacillus</italic>
genomes from
<italic>B. anthracis</italic>
str. Ames (mash dist -p $NUM_CPU $MASH_SKETCH $FASTA_FILE). We used the Mash distance to reclassify each Bacillus genome as
<italic>B. anthracis</italic>
(Mash distance ≤ 0.000687), non-
<italic>anthracis</italic>
BCerG (Mash distance ≤ 0.135333), or non-BCerG (Mash distance > 0.135333). A phylogeny of all completed Bacillus genomes was created with mashtree (mashtree –numcpus 20 *.fasta >bacillus-mashtree.dnd, v0.32,
<ext-link ext-link-type="uri" xlink:href="https://github.com/lskatz/mashtree">https://github.com/lskatz/mashtree</ext-link>
).</p>
<p>Sequence 31-mers were extracted and counted with Jellyfish (jellyfish count -C -m 31 -s 1M -o $JELLYFISH_DB $FASTA_ FILE, v2.2.3,
<xref rid="ref-22" ref-type="bibr">Marçais & Kingsford, 2011</xref>
) and partitioned into two distinct sets characteristic of BCerG (BCerG31) and
<italic>B. anthracis</italic>
(Ba31) (
<xref ref-type="fig" rid="fig-1">Fig. 1</xref>
). The BCerG31 and Ba31 sets were initially comprised of 31-mers conserved within
<italic>every</italic>
member of BCerG (including
<italic>B. anthracis</italic>
) and those restricted to only
<italic>B. anthracis</italic>
, respectively. Any Ba31-mers found in non-
<italic>anthracis</italic>
BCerG members or non-BCerG genomes were filtered out. Likewise, any BCerG31-mers found in non-BCerG Bacillus genomes were filtered out. 31-mers found in rRNA were filtered out with a Jellyfish database created from the SILVA rRNA database (
<xref rid="ref-30" ref-type="bibr">Quast et al., 2013</xref>
). We further filtered the Ba31 and BCerG31 sets using the non-redundant nucleotide sequence database (NT v5, downloaded April 2017). We used BLASTN (blastn -max_hsps 1 -max_target_ seqs 1 -dust no -word_size 7 -outfmt 15 -query $FASTA_FILE -db $BLAST_DB -evalue 10000 -num_threads $NUM_CPU, v2.8.0,
<xref rid="ref-8" ref-type="bibr">Camacho et al., 2009</xref>
) to align Ba31 against non-
<italic>anthracis</italic>
BCerG sequences and BCerG31 against non-BCerG sequences. 31-mers with an exact match were filtered out.</p>
<fig id="fig-1" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.5515/fig-1</object-id>
<label>Figure 1</label>
<caption>
<title>Flowchart of strategy for primer design.</title>
<p>We developed a strategy for selecting the Ba31 and BCerG31 (A) and lef31 (B) k-mer sets. In (A) the outgroup is determined by the k-mer set. For Ba31, the outgroup was comprised of all the non-
<italic>B. anthracis</italic>
genomes; for BCerG31, it consisted of all non-
<italic>B. cereus</italic>
group genomes.</p>
</caption>
<graphic xlink:href="peerj-06-5515-g001"></graphic>
</fig>
</sec>
<sec>
<title>Finding the limits for lethal factor-based detection of
<italic>B. anthracis</italic>
</title>
<p>We used
<italic>B. anthracis</italic>
whole genome shotgun sequencing projects to determine the limit of detection of lethal factor k-mers (lef31). We defined lef31 as the unique set of 31-mers identified in
<italic>lef</italic>
genes downloaded from the NCBI Nucleotide database (previously described) (
<xref ref-type="fig" rid="fig-1">Fig. 1</xref>
).
<italic>B. anthracis</italic>
projects were identified from the SRA with the following query:</p>
<disp-quote>
<p>
<italic>B. anthracis</italic>
projects = ‘genomic[Source] AND random[Selection] AND txid86661 [Organism:exp] AND paired[Layout]) AND wgs[Strategy] AND ”Illumina HiSeq”’</p>
</disp-quote>
<p>In this work we have assumed a 95% ‘confidence limit’ for detection of the lethal factor k-mers, so that detection is held to fail if fewer than 95% of a set of random subsamples are found to contain
<italic>at least one</italic>
lethal factor k-mer. The threshold is then obtained through computational experiment. For each project, we started at 0.2× 
<italic>B. anthracis</italic>
genome coverage and extracted 100 random subsamples of sequences, using Jellyfish as before to determine if at least one lethal factor k-mer was present. We then continued this process, reducing the coverage until fewer than 95% of the subsamples contained at least one lethal factor k-mer. The previous coverage was then recorded as the limit of detection of the lethal toxin for a given sample.</p>
</sec>
<sec>
<title>Assessing Quality of
<italic>B. anthracis</italic>
and
<italic>B. cereus</italic>
Group specific 31-mers</title>
<p>We used ART (art_illumina -l 100 -f $COVERAGE -na -ss HS20 -rs $RANDOM_SEED -i $FASTA_FILE -o $OUTPUT_PREFIX, vMountRainier-2016-06-05,
<xref rid="ref-15" ref-type="bibr">Huang et al., 2012</xref>
) to simulate 100 bp reads with the built-in Illumina HiSeq 2000 error model for each non-
<italic>anthracis Bacillus</italic>
genome. The Illumina HiSeq 2000 error model was selected to match the predominant sequencing technology of the NYC dataset. We simulated coverages ranging from 0.01× to 15× to determine if false positive Ba31 matches were uniform across non-
<italic>anthracis</italic>
BCerG members. We counted 31-mers for each simulated read set with Jellyfish as previously described. We then used the k-mer counts to determine the false positive Ba31 counts in non-anthracis genomes. We found the false positive Ba31 counts to be higher in non-
<italic>B. anthracis</italic>
genomes that were most closely related to
<italic>B. anthracis</italic>
(please see results section). A subset of non-
<italic>B. anthracis</italic>
BCerG genomes with a Mash distance less than 0.01 from
<italic>B. anthracis</italic>
, previously described, were selected as our model set. We further simulated coverages from 15× to 100× to match levels of coverage observed in the NYC dataset. We then applied linear regression, implemented in the R base stats package (R v3.4.3), on this subset to develop a predictive model with the observed Ba31 count as our dependent variable and the observed BCerG k-mer coverage as our independent variable.</p>
</sec>
<sec>
<title>Prediction of low coverage
<italic>B. anthracis</italic>
chromosome in simulated metagenomic sequencing datasets</title>
<p>We used ART to simulate metagenomic mixtures of
<italic>B. anthracis</italic>
str. Ames (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NC_003997">NC_003997</ext-link>
) and
<italic>B. cereus</italic>
strain JEM-2 (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP018935">NZ_CP018935</ext-link>
).
<italic>B. cereus</italic>
strain JEM-2 was selected because it was the closest non-
<italic>anthracis</italic>
BCerG member to
<italic>B. anthracis</italic>
str. Ames (Mash distance 0.00873073). We used coverages between 0–100× for
<italic>B. cereus</italic>
and coverages between 0–0.2× for
<italic>B. anthracis</italic>
. A python script (subsample-ba-lod.py
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NC_003997">NC_003997</ext-link>
.fasta NZ_CP018935.fasta coverages-ba.txt coverages-bcg.txt temp_folder/ fasta/ ba-specific-kmers.fasta bcg-specific-kmers.fasta) was created to simulate mixtures for each pairwise combination of
<italic>B. cereus</italic>
and
<italic>B. anthracis</italic>
coverages. For each mixture, we determined the Ba31 and BCerG31 counts with Jellyfish as previously described. This process was repeated 20 times per pairwise combination of coverages. The model was applied to determine what level of
<italic>B. anthracis</italic>
coverage was required to differentiate observed Ba31-mers from sequencing errors.</p>
<p>We determined Ba31, BCerG31 and lef31 counts for each sample in the NYC study. The model was applied to these counts to determine if observed
<italic>B. anthracis</italic>
k-mers exceeded the level expected due to sequencing errors.</p>
<p>We processed each of the subsampled mixtures and samples from the NYC study with KrakenHLL (krakenhll –report-file $REPORT_FILE –db $DATABASE >$SEQUENCES, v0.4.7,
<xref rid="ref-5" ref-type="bibr">Breitwieser & Salzberg, 2018</xref>
). We used dustmasker (dustmasker -outfmt fasta, v2.8.0,
<xref rid="ref-8" ref-type="bibr">Camacho et al., 2009</xref>
) to create a DUST-masked version of the standard Kraken database (kraken-build –standard –db $DATABASE, database built in April 2017) for this analysis. From the final Kraken report, the number of reads and unique k-mers identified for
<italic>B. anthracis</italic>
were extracted. We compared these results to our method.</p>
<p>Output, figures, runtime parameters and scripts from this study are available in a Git repository hosted at:
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.1323741">https://doi.org/10.5281/zenodo.1323741</ext-link>
.</p>
</sec>
</sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title>NY subway metagenome sequences map to core regions of
<italic>B. anthracis</italic>
and
<italic>B. cereus</italic>
chromosome and plasmids but not to lethal factor gene</title>
<p>In the original analysis of the subway metagenome (
<xref rid="ref-2" ref-type="bibr">Afshinnekoo et al., 2015a</xref>
), two samples (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00134">P00134</ext-link>
(
<ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra?term=SRR1748707">SRR1748707</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra?term=SRR1748708">SRR1748708</ext-link>
), and
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00497">P00497</ext-link>
(
<ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra?term=SRR1749083">SRR1749083</ext-link>
)) were reported to contain reads that mapped to
<italic>Bacillus anthracis</italic>
based on results obtained using the Metaphlan software (
<xref rid="ref-34" ref-type="bibr">Segata et al., 2012</xref>
). We found that 792,282 reads from
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00134">P00134</ext-link>
and 270,964 reads from
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00497">P00497</ext-link>
mapped to the
<italic>B. anthracis</italic>
strain Sterne chromosome. The reads aligned along the entire length of the chromosome, forming a characteristic peak at the replication origin, a pattern often seen when other bacterial chromosomes have been recovered from metagenome samples (
<xref rid="ref-6" ref-type="bibr">Brown et al., 2016</xref>
). However, a similar number of reads from
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00134">P00134</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00497">P00497</ext-link>
(765,466 reads and 265,776 reads, respectively) mapped to the
<italic>B. cereus</italic>
10987 chromosome. We also found that
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00134">P00134</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00497">P00497</ext-link>
reads mapped to the both the pXO1 and pXO2 plasmids in conserved “backbone” regions (
<xref rid="ref-32" ref-type="bibr">Rasko et al., 2007</xref>
) but that no read mapped to the mobile element containing the
<italic>lef</italic>
lethal factor gene. These results showed that the close taxonomic relationship of
<italic>B. anthracis</italic>
and BCerG made identification of the biothreat agent by mapping reads alone unreliable. In addition, the pXO1 and pXO2 plasmids were not reliable as positive markers for
<italic>B. anthracis</italic>
at low genome coverages (when the
<italic>lef</italic>
gene may not be sampled, see next section) because backbone sequences cross-matched against plasmids found in BCerG strains.</p>
</sec>
<sec>
<title>
<italic>B. anthracis</italic>
genome coverage below 0.184× is a “gray area” for detection, where lethal toxin genes may not be sampled</title>
<p>The best test for presence of virulent
<italic>B. anthracis</italic>
(or virulent
<italic>B. cereus</italic>
strains containing pXO1) is detection of the lethal factor gene (2,346 bp) (
<xref rid="ref-4" ref-type="bibr">Bragg & Robertson, 1989</xref>
). However, at low sequence coverage of the pathogen, it is not certain that reads from this gene will be present (given the 3:1 copy number ratio of pXO1 to
<italic>B. anthracis</italic>
chromosome (
<xref rid="ref-33" ref-type="bibr">Read et al., 2002</xref>
) the ratio of chromosome to
<italic>lef</italic>
is ∼620:1). We identified 2,617 31-mers present in 36
<italic>lef</italic>
gene sequences and called this set “lef31”. To estimate the coverage sufficient that we would expect (with probability above some threshold value, here 0.95) to observe lethal factor sequences, we randomly subsampled reads from 164
<italic>B. anthracis</italic>
genome projects and tested for the presence of
<italic>at least one</italic>
lef31 match (
<xref ref-type="fig" rid="fig-2">Fig. 2</xref>
). With Ba31 and BCerG31 k-mer coverages below 0.103× and 0.112×, respectively, this analysis showed we would have less than a 95% chance of sampling a single lef31 k-mer
<italic>even if the lef gene were present</italic>
. These k-mer coverages are approximately 0.184×-fold
<italic>B. anthracis</italic>
genome coverage, or 9,360 100 base pair reads (
<xref ref-type="supplementary-material" rid="supp-1">Fig. S1</xref>
).</p>
<fig id="fig-2" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.5515/fig-2</object-id>
<label>Figure 2</label>
<caption>
<title>Limit of detection for lethal factor gene k-mers (lef31).</title>
<p>A total of 164
<italic>B. anthracis</italic>
sequencing projects were subsampled to different levels of genome coverage, with 100 random subsamples obtained for each coverage level. Our ability to detect the lethal factor gene is assessed by considering the number of these subsamples for which we find at least one lef31 k-mer hit. Two thresholds—95% and 100%—were employed and are shown as colored series below.The figure thus shows the percentage of the
<italic>B. anthracis</italic>
sequencing projects for which 95% (or 100%) of the random subsamples contain at least one lef31 k-mer. (A) shows results with respect to Ba31 k-mer coverage while (B) shows the corresponding results for BCerG coverage. The vertical dashed lines show the coverage limits for detection at the respective threshold levels.</p>
</caption>
<graphic xlink:href="peerj-06-5515-g002"></graphic>
</fig>
</sec>
<sec>
<title>Conserved and specific 31-mer sets for
<italic>B. anthracis</italic>
and BCerG chromosomes</title>
<p>The results of the previous section showed that at low
<italic>B. anthracis</italic>
genome coverage, detection of the lethal factor is not guaranteed. In metagenomic samples, in which sequencing coverage is expected to be low for rare organisms, the most reliable way to detect
<italic>B. anthracis</italic>
was to use chromosomal genetic signatures that distinguished the species from close relatives. We identified 239,503 31-mers conserved in 48
<italic>B. anthracis</italic>
reference genomes that were not also detected in the remainder of the
<italic>Bacillus</italic>
genus (331 genomes), rRNA sequences, or the BLAST non-redundant nucleotide database. We called this set “Ba31”.</p>
<p>We created a second set of 31-mers specific to and conserved in all BCerG genomes (including
<italic>B. anthracis</italic>
). Surprisingly, our initial analysis produced zero 31-mers specific to all 139 BCerG strains and not other
<italic>Bacillus</italic>
. Inspection of the whole genome phylogeny (
<xref ref-type="fig" rid="fig-3">Fig. 3</xref>
) showed that four genomes (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP007512">NZ_CP007512</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP017016">NZ_CP017016</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP020437">NZ_CP020437</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NZ_CP025122">NZ_CP02512</ext-link>
) that fell within the BCerG clade based on phylogeny had not been classified as BCerG in the NCBI Taxonomy hierarchy. After reclassifying these strains as BCerG, we identified 10,183 BCerG specific 31-mers, which we called “BCerG31”.</p>
<fig id="fig-3" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.5515/fig-3</object-id>
<label>Figure 3</label>
<caption>
<title>Unrooted phylogeny of BCerG genome assemblies used in the study after reclassifying BCerG strains.</title>
<p>An unrooted phylogenetic representation of 140 BCerG genomes using Mashtree (v0.32,
<ext-link ext-link-type="uri" xlink:href="https://github.com/lskatz/mashtree">https://github.com/lskatz/mashtree</ext-link>
). Genomes reclassified as BCerG members with mash (v2.0,
<xref rid="ref-27" ref-type="bibr">Ondov et al., 2016</xref>
) are indicated with stars. The clade colored blue are
<italic>B. cereus</italic>
genomes closely related to
<italic>B. anthracis</italic>
that were used to model false positive results (
<xref ref-type="fig" rid="fig-4">Fig. 4</xref>
).</p>
</caption>
<graphic xlink:href="peerj-06-5515-g003"></graphic>
</fig>
</sec>
<sec>
<title>High background levels of
<italic>B. cereus</italic>
strains produce false positive
<italic>B. anthracis</italic>
specific k-mers due to random sequence errors</title>
<p>We simulated synthetic data of
<italic>Bacillus</italic>
reference genomes at different genome coverages using ART software with an error model based on Illumina short read data (
<xref rid="ref-15" ref-type="bibr">Huang et al., 2012</xref>
) (
<xref ref-type="fig" rid="fig-4">Fig. 4</xref>
). We defined ‘k-mer coverage’ as the sum of counts for k-mers detected divided by the number of k-mers in the k-mer set. Ba31 and BCerG k-mer coverage had a linear relationship with genome coverage (
<xref ref-type="supplementary-material" rid="supp-1">Fig. S1</xref>
). The coefficient was less than 1 (0.56 and 0.61 for Ba31 and BCerG31 respectively), because some portions of the chromosomes were not well sampled by the k-mers. We found a strong linear relationship between Ba31 coverage and BCerG31 coverage within
<italic>B. anthracis</italic>
genome subsamples (Pearson’s Correlation
<italic>r</italic>
 = 0.99,
<italic>p</italic>
 < 0.001,
<xref ref-type="supplementary-material" rid="supp-2">Fig. S2</xref>
). As expected, the same relationship did not appear when we subsampled non-
<italic>B. anthracis</italic>
BCerG members (Pearson’s Correlation
<italic>r</italic>
 = 0.74,
<italic>p</italic>
 < 0.001,
<xref ref-type="supplementary-material" rid="supp-3">Fig. S3</xref>
). However, we did see a small number of Ba31 k-mers detected, which we suspected were due to random errors introduced by Illumina sequencing (
<xref ref-type="fig" rid="fig-4">Fig. 4</xref>
). The counts of false positive Ba31 k-mers scaled with the approximate genetic distance to
<italic>B. anthracis</italic>
(as measured by mash
<xref rid="ref-27" ref-type="bibr">Ondov et al., 2016</xref>
) (
<xref ref-type="supplementary-material" rid="supp-4">Fig. S4</xref>
). We simulated synthetic data for a group of BCerG strains most closely related to
<italic>B. anthracis</italic>
(
<xref ref-type="fig" rid="fig-3">Fig. 3</xref>
). We developed a linear regression model to relate BCerG k-mer coverage and sequencing errors based on this group (
<xref ref-type="fig" rid="fig-4">Fig. 4</xref>
). For every unit of BCerG31 k-mer coverage, we predicted 172 Ba31 false positive k-mer counts.</p>
<fig id="fig-4" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.5515/fig-4</object-id>
<label>Figure 4</label>
<caption>
<title>Linear regression model fit of BCerG coverage and false positive Ba31 counts.</title>
<p>We created random synthetic FASTQ files based on BCerG chromosomes from the clade closest to
<italic>B. anthracis</italic>
(blue in
<xref ref-type="fig" rid="fig-3">Fig. 3</xref>
) at different genome coverages and counted the false positive Ba31 k-mers. Shown is the fit of a linear regression model with an intercept of 0, with BCerG31 coverage as the independent variable and the Ba31 false positive count as the dependent variable. The solid line shows the predicted values from the model, and the dashed line reflects the upper 99% prediction interval for the parameters, which we use in the analyses above.</p>
</caption>
<graphic xlink:href="peerj-06-5515-g004"></graphic>
</fig>
</sec>
<sec>
<title>A “specialist” model to interpret patterns of
<italic>B. anthracis</italic>
genetic signatures in metagenome samples</title>
<p>In real metagenome samples
<italic>B. anthracis</italic>
, if present, may only account for a low proportion of the total reads and may also be mixed with higher proportions of closely related BCerG strains. We sought to use the k-mer sets developed in the previous sections and knowledge of the
<italic>lef</italic>
gray zone coverage and BCerG false positive rate to interpret both synthetic and real metagenome datasets. The logic for assignment is shown in
<xref rid="table-1" ref-type="table">Table 1</xref>
and
<xref ref-type="supplementary-material" rid="supp-5">Fig. S5</xref>
.</p>
<table-wrap id="table-1" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.5515/table-1</object-id>
<label>Table 1</label>
<caption>
<title>Potential outcomes of
<italic>B. anthracis</italic>
detection, given matches to the Ba31 set in a shotgun metagenome dataset.</title>
<p>This table discusses the interpretation of four cases when Ba31 k-mer matches are found in the dataset. Columns 1–3 are; lef31 match; whether Ba31 coverage is in the Gray Zone; and whether Ba31 coverage is above the 99% of the error model based on BCerG coverage.</p>
</caption>
<alternatives>
<graphic xlink:href="peerj-06-5515-g005"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
</colgroup>
<thead>
<tr>
<th rowspan="1" colspan="1">
<bold>Case</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Lef31</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Gray Zone</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Exceeds 99% P.I.</bold>
<xref ref-type="fn" rid="table-1fn1">
<sup>a</sup>
</xref>
</th>
<th rowspan="1" colspan="1">
<bold>Interpretation</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">yes</td>
<td rowspan="1" colspan="1">yes or no</td>
<td rowspan="1" colspan="1">yes or no</td>
<td rowspan="1" colspan="1">Evidence of lethal factor gene, could be
<italic>B. anthracis</italic>
or a
<italic>B. cereus</italic>
strain carrying the pXO1 plasmid.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">2</td>
<td rowspan="1" colspan="1">no</td>
<td rowspan="1" colspan="1">yes</td>
<td rowspan="1" colspan="1">yes</td>
<td rowspan="1" colspan="1">Possible
<italic>B. anthracis</italic>
or closely related strain based on high Ba31 counts but genome coverage too low to guarantee seeing the
<italic>lef</italic>
gene. Requires more sequence coverage and/or validation by PCR or other methods.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">3</td>
<td rowspan="1" colspan="1">no</td>
<td rowspan="1" colspan="1">no</td>
<td rowspan="1" colspan="1">yes</td>
<td rowspan="1" colspan="1">Ba31 matches exceed what is expected by the BCerG error model, but are at a level of genome coverage at which lethal factor should have been detected. Most likely explanation is
<italic>B. anthracis</italic>
strain cured of pXO1 or unsequenced lineage closely related to
<italic>B. anthracis</italic>
.</td>
</tr>
<tr>
<td rowspan="1" colspan="1">4</td>
<td rowspan="1" colspan="1">no</td>
<td rowspan="1" colspan="1">yes or no</td>
<td rowspan="1" colspan="1">no</td>
<td rowspan="1" colspan="1">Most likely scenario is that BCerG background produced Ba31 k-mers through random errors but impossible to also rule out presence of low coverage
<italic>B. anthracis</italic>
</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="table-1fn">
<p>
<bold>Notes.</bold>
</p>
</fn>
<fn id="table-1fn1">
<label>a</label>
<p>Prediction interval.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>For our synthetic dataset we mixed low coverage
<italic>B. anthracis</italic>
with higher coverages of BCerG sequence data (see Methods). We calculated the BCerG31 and Ba31 coverages for each mixture. Based on the BCerG sequence error model, we calculated the 99% count of Ba31 signatures predicted to be present by sequencing error under the assumption that there was no
<italic>B. anthracis</italic>
present and that all BCerG were drawn from the most closely related clade (
<xref ref-type="fig" rid="fig-3">Fig. 3</xref>
). We also reported whether the Ba31 coverage lay in or above the gray zone (
<xref rid="table-2" ref-type="table">Table 2</xref>
,
<xref ref-type="supplementary-material" rid="supp-8">File S2</xref>
,
<xref ref-type="supplementary-material" rid="supp-6">Fig. S6</xref>
). When
<italic>B. anthracis</italic>
was below 0.003× genome coverage (approximately 16,000 bp), we could not distinguish its presence from errors produced in the absence of
<italic>B. cereus</italic>
. As expected, we found that the level of BCerG coverage determined the lower limit to differentiate genuine Ba31 hits from sequencing errors. At 75× BCerG coverage the required
<italic>B. anthracis</italic>
coverage to differentiate Ba31 matches from sequencing errors doubled to 0.006×. The threshold for accurate detection was further raised to 0.01× 
<italic>B. anthracis</italic>
genome coverage at 100× BCerG coverage.</p>
<table-wrap id="table-2" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.5515/table-2</object-id>
<label>Table 2</label>
<caption>
<title>Artificial mixtures of low coverage
<italic>B. anthracis</italic>
and high coverage
<italic>
<italic>B. cereus</italic>
</italic>
.</title>
<p>This table shows some key results from more than 300 artificial mixtures of
<italic>B. anthracis</italic>
and
<italic>B. cereus</italic>
sequences created to test our specialized model (
<xref ref-type="supplementary-material" rid="supp-7">File S1</xref>
). The table includes three
<italic>B. anthracis</italic>
coverages for each
<italic>B. cereus</italic>
coverage. The
<italic>B. anthracis</italic>
simulated coverages represent the minimum
<italic>B. anthracis</italic>
coverage, the coverage at which
<italic>B. anthracis</italic>
was detectable, and the maximum
<italic>B. anthracis</italic>
coverage. The first two columns are the coverage in the artificial mixtures of
<italic>B. cereus</italic>
and
<italic>B. anthracis</italic>
genomes, respectively. The third column is the observed BCerG31 k-mer coverage. Columns 4-6 are the observed number of Ba31 k-mers, the expected number of Ba31 k-mers based on the BCerG31 coverage (see
<xref ref-type="fig" rid="fig-4">Fig. 4</xref>
) and the 99% prediction interval of the model, which we take as an indicative worst case threshold. The seventh column summarizes whether the observed Ba31 is greater than the 99% P.I. The eighth column is whether the Ba31 coverage is in the “gray zone” (<0.18× coverage). “No” means the Ba31 exceeds the threshold (note it is possible for the Ba31 coverage to be at gray zone level but still have a positive match to a lef31k-mer). The final column shows whether KrakenHLL (
<xref rid="ref-5" ref-type="bibr">Breitwieser & Salzberg, 2018</xref>
) run on the sample predicted the presence of
<italic>B. anthracis</italic>
. This table shows that false positives k-mers resulting from high BCerG coverage limit the detection of
<italic>B. anthracis</italic>
k-mers (Ba31) in mixed cultures. Below 0.006× (75×-fold
<italic>B. cereus</italic>
) and 0.01× (100×-fold B. cereus)
<italic>B. anthracis</italic>
genome coverages, true positive Ba31 matches cannot be differentiated from false positive matches. KrakenHLL predicted
<italic>B. anthracis</italic>
to be present even when it was not because of the background BCerG genomes coverage.</p>
</caption>
<alternatives>
<graphic xlink:href="peerj-06-5515-g006"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
</colgroup>
<thead>
<tr>
<th align="center" colspan="2" rowspan="1">
<bold>Artificial genome coverage</bold>
</th>
<th rowspan="1" colspan="1"></th>
<th align="center" colspan="3" rowspan="1">
<bold>Ba31 count</bold>
</th>
<th rowspan="1" colspan="1"></th>
<th rowspan="1" colspan="1"></th>
<th rowspan="1" colspan="1"></th>
</tr>
<tr>
<th rowspan="1" colspan="1">
<bold>
<italic>B. cereus</italic>
</bold>
</th>
<th rowspan="1" colspan="1">
<bold>
<italic>B. anthracis</italic>
</bold>
</th>
<th rowspan="1" colspan="1">
<bold>BCerG31 coverage</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Observed</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Model fit</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Model upper</bold>
<bold>99% P.I.</bold>
<xref ref-type="fn" rid="table-2fn1">
<sup>a</sup>
</xref>
</th>
<th rowspan="1" colspan="1">
<bold>Exceeds 99% P.I.</bold>
<xref ref-type="fn" rid="table-2fn1">
<sup>a</sup>
</xref>
</th>
<th rowspan="1" colspan="1">
<bold>lef31 gray zone</bold>
</th>
<th rowspan="1" colspan="1">
<bold>KrakenHLL</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="1" colspan="1">0× </td>
<td rowspan="1" colspan="1">0.001× </td>
<td rowspan="1" colspan="1">0.00002</td>
<td rowspan="1" colspan="1">10</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">331</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">0× </td>
<td rowspan="1" colspan="1">0.003× </td>
<td rowspan="1" colspan="1">0.00245</td>
<td rowspan="1" colspan="1">346</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">332</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">0× </td>
<td rowspan="1" colspan="1">0.2× </td>
<td rowspan="1" colspan="1">0.123</td>
<td rowspan="1" colspan="1">25,396</td>
<td rowspan="1" colspan="1">21</td>
<td rowspan="1" colspan="1">352</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">1× </td>
<td rowspan="1" colspan="1">0× </td>
<td rowspan="1" colspan="1">0.593</td>
<td rowspan="1" colspan="1">99</td>
<td rowspan="1" colspan="1">102</td>
<td rowspan="1" colspan="1">433</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">1× </td>
<td rowspan="1" colspan="1">0.003× </td>
<td rowspan="1" colspan="1">0.610</td>
<td rowspan="1" colspan="1">444</td>
<td rowspan="1" colspan="1">104</td>
<td rowspan="1" colspan="1">437</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">1× </td>
<td rowspan="1" colspan="1">0.2× </td>
<td rowspan="1" colspan="1">0.727</td>
<td rowspan="1" colspan="1">25,627</td>
<td rowspan="1" colspan="1">125</td>
<td rowspan="1" colspan="1">456</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">5× </td>
<td rowspan="1" colspan="1">0× </td>
<td rowspan="1" colspan="1">3.048</td>
<td rowspan="1" colspan="1">487</td>
<td rowspan="1" colspan="1">524</td>
<td rowspan="1" colspan="1">855</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">5× </td>
<td rowspan="1" colspan="1">0.003× </td>
<td rowspan="1" colspan="1">3.060</td>
<td rowspan="1" colspan="1">919</td>
<td rowspan="1" colspan="1">526</td>
<td rowspan="1" colspan="1">857</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">5× </td>
<td rowspan="1" colspan="1">0.2× </td>
<td rowspan="1" colspan="1">3.155</td>
<td rowspan="1" colspan="1">25,502</td>
<td rowspan="1" colspan="1">542</td>
<td rowspan="1" colspan="1">874</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">10× </td>
<td rowspan="1" colspan="1">0× </td>
<td rowspan="1" colspan="1">6.115</td>
<td rowspan="1" colspan="1">1,050</td>
<td rowspan="1" colspan="1">1,051</td>
<td rowspan="1" colspan="1">1,382</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">10× </td>
<td rowspan="1" colspan="1">0.004× </td>
<td rowspan="1" colspan="1">6.100</td>
<td rowspan="1" colspan="1">1,531</td>
<td rowspan="1" colspan="1">1,048</td>
<td rowspan="1" colspan="1">1,379</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">10× </td>
<td rowspan="1" colspan="1">0.2× </td>
<td rowspan="1" colspan="1">6.450</td>
<td rowspan="1" colspan="1">26,346</td>
<td rowspan="1" colspan="1">1,074</td>
<td rowspan="1" colspan="1">1,405</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">25× </td>
<td rowspan="1" colspan="1">0× </td>
<td rowspan="1" colspan="1">15.277</td>
<td rowspan="1" colspan="1">2,516</td>
<td rowspan="1" colspan="1">2,625</td>
<td rowspan="1" colspan="1">2,957</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">25× </td>
<td rowspan="1" colspan="1">0.004× </td>
<td rowspan="1" colspan="1">15.174</td>
<td rowspan="1" colspan="1">3,075</td>
<td rowspan="1" colspan="1">2,608</td>
<td rowspan="1" colspan="1">2,939</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">25× </td>
<td rowspan="1" colspan="1">0.2× </td>
<td rowspan="1" colspan="1">15.339</td>
<td rowspan="1" colspan="1">27,536</td>
<td rowspan="1" colspan="1">2,636</td>
<td rowspan="1" colspan="1">2,967</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">50× </td>
<td rowspan="1" colspan="1">0× </td>
<td rowspan="1" colspan="1">30.381</td>
<td rowspan="1" colspan="1">5,058</td>
<td rowspan="1" colspan="1">5,221</td>
<td rowspan="1" colspan="1">5,552</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">50× </td>
<td rowspan="1" colspan="1">0.005× </td>
<td rowspan="1" colspan="1">30.438</td>
<td rowspan="1" colspan="1">5,726</td>
<td rowspan="1" colspan="1">5,231</td>
<td rowspan="1" colspan="1">5,562</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">50× </td>
<td rowspan="1" colspan="1">0.2× </td>
<td rowspan="1" colspan="1">30.595</td>
<td rowspan="1" colspan="1">29,766</td>
<td rowspan="1" colspan="1">5,257</td>
<td rowspan="1" colspan="1">5,589</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">75× </td>
<td rowspan="1" colspan="1">0× </td>
<td rowspan="1" colspan="1">45.753</td>
<td rowspan="1" colspan="1">7,323</td>
<td rowspan="1" colspan="1">4,530</td>
<td rowspan="1" colspan="1">8,194</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">75× </td>
<td rowspan="1" colspan="1">0.006× </td>
<td rowspan="1" colspan="1">45.699</td>
<td rowspan="1" colspan="1">8,351</td>
<td rowspan="1" colspan="1">7,853</td>
<td rowspan="1" colspan="1">8,184</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">75× </td>
<td rowspan="1" colspan="1">0.2× </td>
<td rowspan="1" colspan="1">45.859</td>
<td rowspan="1" colspan="1">31,971</td>
<td rowspan="1" colspan="1">7,881</td>
<td rowspan="1" colspan="1">8,212</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">100× </td>
<td rowspan="1" colspan="1">0× </td>
<td rowspan="1" colspan="1">60.926</td>
<td rowspan="1" colspan="1">9,633</td>
<td rowspan="1" colspan="1">10,470</td>
<td rowspan="1" colspan="1">10,801</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">100× </td>
<td rowspan="1" colspan="1">0.01× </td>
<td rowspan="1" colspan="1">60.958</td>
<td rowspan="1" colspan="1">11,020</td>
<td rowspan="1" colspan="1">10,475</td>
<td rowspan="1" colspan="1">10,807</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
<tr>
<td rowspan="1" colspan="1">100× </td>
<td rowspan="1" colspan="1">0.2× </td>
<td rowspan="1" colspan="1">61.093</td>
<td rowspan="1" colspan="1">33,761</td>
<td rowspan="1" colspan="1">10,498</td>
<td rowspan="1" colspan="1">10,830</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="table-2fn">
<p>
<bold>Notes.</bold>
</p>
</fn>
<fn id="table-2fn1">
<label>a</label>
<p>Prediction interval.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>In contrast, when the samples were classified using KrakenHLL (
<xref rid="ref-5" ref-type="bibr">Breitwieser & Salzberg, 2018</xref>
), an accurate generalist program based on 31-mers, we found that all were predicted to contain
<italic>B. anthracis</italic>
, including negative controls (
<xref rid="table-2" ref-type="table">Table 2</xref>
). The
<italic>B. anthracis</italic>
calls were made because of the sequence errors from the high coverage BCerG genomes.</p>
<p>Finally, we tested our model against the NYC dataset (
<xref rid="table-3" ref-type="table">Table 3</xref>
,
<xref ref-type="supplementary-material" rid="supp-8">File S2</xref>
). All 1,458 samples were negative for lef31, in line with the conclusion reached from re-analysis of the dataset that
<italic>B. anthracis</italic>
was absent from the NY subway (
<xref rid="ref-23" ref-type="bibr">Mason, 2015</xref>
). We found that 1,367 of the 1,458 samples had at least one BCerG31 k-mer match and, of these, 1,085 contained at least one Ba31 match. We identified 34 samples with Ba31 counts above the 99% threshold predicted by the BCerG coverage. These samples did not include the two (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00134">P00134</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00497">P00497</ext-link>
), previously flagged as
<italic>B. anthracis</italic>
positive (
<xref rid="ref-2" ref-type="bibr">Afshinnekoo et al., 2015a</xref>
) (
<xref rid="table-3" ref-type="table">Table 3</xref>
). KrakenHLL also classified each these 34 samples as positive for
<italic>B. anthracis</italic>
</p>
<table-wrap id="table-3" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.7717/peerj.5515/table-3</object-id>
<label>Table 3</label>
<caption>
<title>Reanalysis of NYC subway metagenome sequencing.</title>
<p>We counted Ba31, BCerG31 and lef31 k-mers in 1,458 NYC subway metagenomic samples (
<xref rid="ref-2" ref-type="bibr">Afshinnekoo et al., 2015a</xref>
;
<xref rid="ref-3" ref-type="bibr">Afshinnekoo et al., 2015b</xref>
). The table is a breakdown of samples that were within the gray zone and/or had Ba31 matches that exceed the 99% prediction interval. Columns 2–8 display the same data types as columns 3–9 in
<xref rid="table-2" ref-type="table">Table 2</xref>
. The additional lef column shows whether lef31 matches were identified or not. The final column provides the outcome case of the sample (
<xref rid="table-1" ref-type="table">Table 1</xref>
). This table presents four samples excerpted from the complete results for all samples (
<xref ref-type="supplementary-material" rid="supp-8">File S2</xref>
). There is one sample within the gray zone (P00738), two from the original study (P00134 and P00497) and an outlier of samples which exceed the 99% prediction interval (P00981).</p>
</caption>
<alternatives>
<graphic xlink:href="peerj-06-5515-g007"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
<col span="1"></col>
</colgroup>
<thead>
<tr>
<th rowspan="1" colspan="1"></th>
<th rowspan="1" colspan="1"></th>
<th align="center" colspan="3" rowspan="1">
<bold>Ba31 Count</bold>
</th>
<th rowspan="1" colspan="1"></th>
<th rowspan="1" colspan="1"></th>
<th rowspan="1" colspan="1"></th>
<th rowspan="1" colspan="1"></th>
<th rowspan="1" colspan="1"></th>
</tr>
<tr>
<th rowspan="1" colspan="1">
<bold>Sample</bold>
</th>
<th rowspan="1" colspan="1">
<bold>BCerG31 coverage</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Observed</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Model fit</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Model upper 99% P.I.</bold>
<xref ref-type="fn" rid="table-3fn2">
<sup>b</sup>
</xref>
</th>
<th rowspan="1" colspan="1">
<bold>Exceeds 99% P.I.</bold>
<xref ref-type="fn" rid="table-3fn2">
<sup>b</sup>
</xref>
</th>
<th rowspan="1" colspan="1">
<bold>Gray zone</bold>
</th>
<th rowspan="1" colspan="1">
<bold>KrakenHLL</bold>
</th>
<th rowspan="1" colspan="1">
<bold>lef</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Outcome case</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="1" colspan="1">P00134
<xref ref-type="fn" rid="table-3fn1">
<sup>a</sup>
</xref>
</td>
<td rowspan="1" colspan="1">19.71</td>
<td rowspan="1" colspan="1">2,755</td>
<td rowspan="1" colspan="1">3,387</td>
<td rowspan="1" colspan="1">3,718</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">4</td>
</tr>
<tr>
<td rowspan="1" colspan="1">P00497
<xref ref-type="fn" rid="table-3fn1">
<sup>a</sup>
</xref>
</td>
<td rowspan="1" colspan="1">4.05</td>
<td rowspan="1" colspan="1">953</td>
<td rowspan="1" colspan="1">696</td>
<td rowspan="1" colspan="1">1,027</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">4</td>
</tr>
<tr>
<td rowspan="1" colspan="1">P00981</td>
<td rowspan="1" colspan="1">1.32</td>
<td rowspan="1" colspan="1">20,079</td>
<td rowspan="1" colspan="1">226</td>
<td rowspan="1" colspan="1">558</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">P00738</td>
<td rowspan="1" colspan="1">0.002</td>
<td rowspan="1" colspan="1">396</td>
<td rowspan="1" colspan="1">1</td>
<td rowspan="1" colspan="1">331</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">Yes</td>
<td rowspan="1" colspan="1">No</td>
<td rowspan="1" colspan="1">2</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="table-3fn">
<p>
<bold>Notes.</bold>
</p>
</fn>
<fn id="table-3fn1">
<label>a</label>
<p>Samples previously identified as containing
<italic>B. anthracis</italic>
.</p>
</fn>
<fn id="table-3fn2">
<label>b</label>
<p>Prediction Interval.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>In this work we have described a significant update to a
<italic>B. anthracis</italic>
specific 31-mer set that was introduced in earlier blog posts (
<xref rid="ref-29" ref-type="bibr">Petit III et al., 2015</xref>
;
<xref rid="ref-25" ref-type="bibr">Minot et al., 2015</xref>
) and we have shown how this set can be used to interpret
<italic>B. anthracis</italic>
specific signatures in Illumina metagenome samples. We chose to use k-mer-based signatures for the ease and speed of computation, with the length of 31 nt selected as it was identified as the shortest likely to be unique across bacteria datasets (
<xref rid="ref-19" ref-type="bibr">Koslicki & Falush, 2016</xref>
).</p>
<p>Some species present unusual challenges for metagenome identification. There is no consistently applied definition for the boundary that divides bacterial species based on DNA sequence identity and in some cases the presence or absence of mobile elements like plasmids and phages are required for speciation.
<italic>B. anthracis</italic>
is closely related to non-biothreat species and acquires its enhanced virulence from genes on mobile plasmids. Such species can be hard to model using “generalist” programs (such as Kraken) that attempt to classify every read in the dataset into one of thousands of taxonomic groups. We use a “specialist” approach aiming to solve a narrow problem that can be used to augment the predictions of generalist software. Specialist analyses can take advantage of unique features of the system and can also afford more effort in the curation of training data. In this case, we designed 31-mer signature sets based on comparison of hundreds of complete
<italic>Bacillus</italic>
genomes and we incorporated knowledge of false positive k-mers likely to be produced by close relatives of
<italic>B. anthracis</italic>
to develop a ‘worst case’ linear regression model to differentiate
<italic>B. anthracis</italic>
from sequencing errors. We also used the fact that the presence of a specific gene (
<italic>lef</italic>
) was diagnostic for anthrax. In designing our k-mer sets we encountered some rare cases of taxonomic mis-assignment in public datasets and were able to take corrective action (
<xref ref-type="fig" rid="fig-3">Fig. 3</xref>
). Generalist programs also rely on the same taxonomy and reference sequence databases, but it is harder to detect small errors that lead to mis-assignments when done on a large scale (
<xref rid="ref-26" ref-type="bibr">Nasko et al., 2018</xref>
). If we were to attempt approaches to specifically detect other known
<italic>B. cereus</italic>
strains that contain pXO1(
<xref rid="ref-14" ref-type="bibr">Hoffmaster et al., 2004</xref>
;
<xref rid="ref-17" ref-type="bibr">Klee et al., 2010</xref>
), we would have to develop and test new k-mer sets based on their unique chromosomal SNPs. Although we concentrate here on
<italic>B. anthracis</italic>
and BCerG, specialist methods could also be developed for other bacterial pathogens (e.g.,
<italic>Yersinia pestis</italic>
and
<italic>Shigella sonnei)</italic>
using a similar strategy of accounting for possible non-pathogen close relatives in the sample and the diagnostic presence of high consequence virulence genes acquired by horizontal transfer.</p>
<p>Even when a specialized algorithm has been developed, judgement is still required in interpreting results. In the case of the
<italic>Bacillus</italic>
genomes in particular, DNA extraction biases may affect results in ways we cannot assess without empirical experiments. We can’t tell what proportion of the DNA came from lysis-resistant spores and what proportion was from the more fragile vegetative state, and how this balance might vary between strains across environments. Similarly, using a different sequencing technology, such the Pacific Biosystems SMRT system, with a different error profile, would require recalibration of the model.</p>
<p>Our reanalysis of the NYC data (
<xref rid="ref-2" ref-type="bibr">Afshinnekoo et al., 2015a</xref>
) showed that there was no direct evidence for the lethal factor k-mers in the metagenome samples (
<xref ref-type="supplementary-material" rid="supp-8">File S2</xref>
). This confirms other work (
<xref rid="ref-23" ref-type="bibr">Mason, 2015</xref>
;
<xref rid="ref-25" ref-type="bibr">Minot et al., 2015</xref>
;
<xref rid="ref-24" ref-type="bibr">McIntyre et al., 2017</xref>
), and together with the low prior probability of encountering
<italic>B. anthracis</italic>
in New York City, suggests that the samples taken were all negative for anthrax. The two samples originally flagged as possibly positive (
<xref rid="table-3" ref-type="table">Table 3</xref>
) fell under case 4 (
<xref rid="table-1" ref-type="table">Table 1</xref>
), as did 1,049 out of the other 1,456 samples. There were 373 samples with no Ba31 k-mer matches. These are all most likely true negatives, although, as we showed in the synthetic dataset, high BCerG coverage can mask the signal of low coverage
<italic>B. anthracis</italic>
(
<xref rid="table-2" ref-type="table">Table 2</xref>
). To get a true negative would theoretically involve sequencing every cell in the sample (assuming perfectly efficient DNA preparation), which is impossible currently for all but the simplest communities. The limit of detection will be a complex calculation that involves the amount of DNA sequence generated and the complexity of the microbial community. Negative (and positive) calls ultimately have to be supported through sensitive detection assays such as PCR and/or culture.</p>
<p>We identified 34 samples above the BCerG thresholds for our model (
<xref rid="table-3" ref-type="table">Table 3</xref>
). All the samples fell under case 3 except a single sample which fell under case 2 (
<xref rid="table-1" ref-type="table">Table 1</xref>
). An outlier of case 3 samples,
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00981">P00981</ext-link>
, taken from a metal handrail on the A train route (
<xref rid="ref-2" ref-type="bibr">Afshinnekoo et al., 2015a</xref>
), had high Ba31 counts (
<italic>n</italic>
 = 20,079). As we collect more genomes of
<italic>B. cereus</italic>
group we may see more Ba31 k-mers in BCerG genomes. These samples may contain members of yet unencountered lineages more closely related to
<italic>B. anthracis</italic>
than previously seen, or possibly the result of recent recombination between
<italic>B. anthracis</italic>
and
<italic>B. cereus</italic>
genomes (although the latter has not been reported). It is important that these strains are isolated, sequenced and added to public databases to iteratively improve pathogen detection. The single case 2 sample,
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00738">P00738</ext-link>
(
<xref rid="table-3" ref-type="table">Table 3</xref>
), was also on a metal handrail from the A train route, although sampled 3 days earlier than
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/protein/P00981">P00981</ext-link>
. This sample was possibly the most problematic because the Ba31 counts were in the gray zone, meaning there was not enough coverage to rule out
<italic>lef</italic>
being present. Most likely, this sample contained another near-
<italic>B. anthracis</italic>
strain, but case 2 samples should be a priority for retesting by culture and PCR methods.</p>
</sec>
<sec sec-type="discussion">
<title>Conclusions</title>
<p>If
<italic>B. anthracis,</italic>
or another BCerG strain containing pXO1, is present in a shotgun metagenome sample at high genome coverage, identification of
<italic>lef</italic>
k-mers is a strong signal for the likely presence of anthrax-causing bacteria. We showed that using a
<italic>B. anthracis</italic>
specific k-mer set alone to call the presence of
<italic>B. anthracis</italic>
produced many false positive calls because sequencing errors of common co-resident BCerG bacteria. We developed models to partition cases that contained evidence of possible low coverage
<italic>B. anthracis</italic>
, accounting for
<italic>B. cereus</italic>
coverage. However, in simulations, we showed that false negative results can arise when the BCerG coverage is high. Reanalysis of the NYC subway metagenome study confirmed the absence of
<italic>B. anthracis</italic>
containing
<italic>lef</italic>
but we found evidence in at least two samples of BCerG strains that contained what were considered
<italic>B. anthracis</italic>
specific sequences. Culturing strains such as these, genome sequencing and sharing to the public domain will help improve
<italic>B. anthracis</italic>
detection in metagenome shotgun samples.</p>
</sec>
<sec sec-type="supplementary-material" id="supplemental-information">
<title> Supplemental Information</title>
<supplementary-material content-type="local-data" id="supp-1">
<object-id pub-id-type="doi">10.7717/peerj.5515/supp-1</object-id>
<label>Figure S1</label>
<caption>
<title>Ba31 and BCerG31 coverages have a linear relationship with genome coverage</title>
<p>We created synthetic FASTQ files of
<italic>B. anthracis</italic>
(A) and BCerG (B) at different genome coverages and counted Ba31 and BCerG31 k-mers. A linear model with an intercept of 0 is displayed in each case.</p>
</caption>
<media xlink:href="peerj-06-5515-s001.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="supp-2">
<object-id pub-id-type="doi">10.7717/peerj.5515/supp-2</object-id>
<label>Figure S2</label>
<caption>
<title>In
<italic>B. anthracis</italic>
genomes, Ba31 coverage is strongly correlated with BCerG31 coverage</title>
<p>We created synthetic
<italic>B. anthracis</italic>
FASTQ files at different genome coverages and counted BCerG31 and Ba31 k-mers. A linear model with an intercept of 0 is displayed.</p>
</caption>
<media xlink:href="peerj-06-5515-s002.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="supp-3">
<object-id pub-id-type="doi">10.7717/peerj.5515/supp-3</object-id>
<label>Figure S3</label>
<caption>
<title>In non-B. anthracis genomes, sequencing errors create a weak linear relationship between Ba31 and BCerG31 coverage</title>
<p>We created synthetic non-
<italic>B. anthracis</italic>
FASTQ files at different genome coverages and counted BCerG31 and Ba31 k-mers. A linear model with an intercept of 0 is displayed.</p>
</caption>
<media xlink:href="peerj-06-5515-s003.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="supp-4">
<object-id pub-id-type="doi">10.7717/peerj.5515/supp-4</object-id>
<label>Figure S4</label>
<caption>
<title>The genetic relatedness between
<italic>B. anthracis</italic>
and non-
<italic>B. anthracis</italic>
BCerG members affects Ba31 false positive matches</title>
<p>Synthetic FASTQ files for all BCerG genomes shown in
<xref ref-type="fig" rid="fig-2">Fig. 2</xref>
were created and the counts of Ba31 false positive k-mers were plotted against BCerG k-mer coverage. Dots are colored by the Mash distance (
<xref rid="ref-27" ref-type="bibr">Ondov et al., 2016</xref>
) from the
<italic>B. anthracis</italic>
str. Ames (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/nuccore/NC_003997">NC_003997</ext-link>
) genome.</p>
</caption>
<media xlink:href="peerj-06-5515-s004.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="supp-5">
<object-id pub-id-type="doi">10.7717/peerj.5515/supp-5</object-id>
<label>Figure S5</label>
<caption>
<title>A flowchart of potential outcomes of
<italic>B. anthracis</italic>
detection, given matches to the Ba31 set in a shotgun metagenome dataset</title>
<p>This flowchart presents a visual representation of
<xref rid="table-3" ref-type="table">Table 3</xref>
.</p>
</caption>
<media xlink:href="peerj-06-5515-s005.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="supp-6">
<object-id pub-id-type="doi">10.7717/peerj.5515/supp-6</object-id>
<label>Figure S6</label>
<caption>
<title>Limit of detection for
<italic>B. anthracis</italic>
k-mers (Ba31) in mixtures of low
<italic> B. anthracis</italic>
coverage and high
<italic>B. cereus</italic>
coverage</title>
<p>We created artificial mixtures of
<italic>B. anthracis</italic>
and
<italic>B. cereus</italic>
to determine the limit of detection for
<italic>B. anthracis</italic>
k-mers (Ba31). Each panel represents a different coverage of
<italic>B. cereus</italic>
and the points are the different
<italic>B. anthracis</italic>
coverages. The points are colored red if Ba31 matches could not be differentiated from sequencing errors. The error model is indicated by the solid line and the 99% prediction interval by the dashed line. The first
<italic>B. anthracis</italic>
coverage value that exceeded the error model is determined as the limit of detection of Ba31.</p>
</caption>
<media xlink:href="peerj-06-5515-s006.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="supp-7">
<object-id pub-id-type="doi">10.7717/peerj.5515/supp-7</object-id>
<label>File S1</label>
<caption>
<title>Artificial mixtures of low coverage
<italic>B. anthracis</italic>
and high coverage
<italic>B. cereus</italic>
</title>
<p>The complete dataset for
<xref rid="table-1" ref-type="table">Table 1</xref>
.</p>
</caption>
<media xlink:href="peerj-06-5515-s007.txt">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="supp-8">
<object-id pub-id-type="doi">10.7717/peerj.5515/supp-8</object-id>
<label>File S2</label>
<caption>
<title>Reanalysis of NYC subway metagenome sequencing</title>
<p>The complete dataset for
<xref rid="table-2" ref-type="table">Table 2</xref>
.</p>
</caption>
<media xlink:href="peerj-06-5515-s008.txt">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>Thanks to Sam Minot, Chris Greenfield, Chris Mason and his group for discussion following our original blog post.</p>
</ack>
<sec sec-type="additional-information">
<title>Additional Information and Declarations</title>
<fn-group content-type="competing-interests">
<title>Competing Interests</title>
<fn id="conflict-1" fn-type="COI-statement">
<p>Timothy D. Read is an Academic Editor for PeerJ.</p>
</fn>
</fn-group>
<fn-group content-type="author-contributions">
<title>Author Contributions</title>
<fn id="contribution-1" fn-type="con">
<p>
<xref ref-type="contrib" rid="author-1">Robert A. Petit III</xref>
conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.</p>
</fn>
<fn id="contribution-2" fn-type="con">
<p>
<xref ref-type="contrib" rid="author-2">James M. Hogan</xref>
and
<xref ref-type="contrib" rid="author-5">Timothy D. Read</xref>
conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.</p>
</fn>
<fn id="contribution-3" fn-type="con">
<p>
<xref ref-type="contrib" rid="author-3">Matthew N. Ezewudo</xref>
and
<xref ref-type="contrib" rid="author-4">Sandeep J. Joseph</xref>
performed the experiments, contributed reagents/materials/analysis tools, approved the final draft.</p>
</fn>
</fn-group>
<fn-group content-type="other">
<title>Data Availability</title>
<fn id="addinfo-1">
<p>The following information was supplied regarding data availability:</p>
<p>Robert A. Petit III. rpetit3/anthrax-metagenome-study: Repo state as of July 30th, 2018 (Version final-revisions). Zenodo.
<ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.1323741">http://doi.org/10.5281/zenodo.1323741</ext-link>
.</p>
<p>Code repository of previous analysis:
<ext-link ext-link-type="uri" xlink:href="https://github.com/Read-Lab-Confederation/nyc-subway-anthrax-study">https://github.com/Read-Lab-Confederation/nyc-subway-anthrax-study</ext-link>
.</p>
</fn>
</fn-group>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1">
<label>Ackelsberg et al. (2015)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ackelsberg</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rakeman</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Petersen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mead</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schriefer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kingry</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Hoffmaster</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gee</surname>
<given-names>JE</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Lack of evidence for plague or anthrax on the New York City subway</article-title>
<source>Cell systems</source>
<volume>1</volume>
<fpage>4</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="doi">10.1016/j.cels.2015.07.008</pub-id>
<pub-id pub-id-type="pmid">27135683</pub-id>
</element-citation>
</ref>
<ref id="ref-2">
<label>Afshinnekoo et al. (2015a)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Afshinnekoo</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Meydan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Chowdhury</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jaroudi</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Boyer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bernstein</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Maritz</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Reeves</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Gandara</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Chhangawala</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ahsanuddin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Simmons</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Nessel</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Sundaresh</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Pereira</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Jorgensen</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kolokotronis</surname>
<given-names>S-O</given-names>
</name>
<name>
<surname>Kirchberger</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Garcia</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Gandara</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Dhanraj</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nawrin</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Saletore</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Vijay</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hénaff</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Zumbo</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Walsh</surname>
<given-names>M</given-names>
</name>
<name>
<surname>O’Mullan</surname>
<given-names>GD</given-names>
</name>
<name>
<surname>Tighe</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Dudley</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Dunaif</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ennis</surname>
<given-names>S</given-names>
</name>
<name>
<surname>O’Halloran</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Magalhaes</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Boone</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Muth</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Paolantonio</surname>
<given-names>KS</given-names>
</name>
<name>
<surname>Alter</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Schadt</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Garbarino</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Prill</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Carlton</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Levy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mason</surname>
<given-names>CE</given-names>
</name>
</person-group>
<year>2015a</year>
<article-title>Geospatial resolution of human and bacterial diversity with city-scale metagenomics</article-title>
<source>Cell Systems</source>
<volume>1</volume>
<fpage>72</fpage>
<lpage>87</lpage>
<pub-id pub-id-type="doi">10.1016/j.cels.2015.01.001</pub-id>
<pub-id pub-id-type="pmid">26594662</pub-id>
</element-citation>
</ref>
<ref id="ref-3">
<label>Afshinnekoo et al. (2015b)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Afshinnekoo</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Meydan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Chowdhury</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jaroudi</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Boyer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bernstein</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Maritz</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Reeves</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Gandara</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Chhangawala</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ahsanuddin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Simmons</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Nessel</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Sundaresh</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Pereira</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Jorgensen</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kolokotronis</surname>
<given-names>S-O</given-names>
</name>
<name>
<surname>Kirchberger</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Garcia</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Gandara</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Dhanraj</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nawrin</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Saletore</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Vijay</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hénaff</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Zumbo</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Walsh</surname>
<given-names>M</given-names>
</name>
<name>
<surname>O’Mullan</surname>
<given-names>GD</given-names>
</name>
<name>
<surname>Tighe</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Dudley</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Dunaif</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ennis</surname>
<given-names>S</given-names>
</name>
<name>
<surname>O’Halloran</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Magalhaes</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Boone</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Muth</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Paolantonio</surname>
<given-names>KS</given-names>
</name>
<name>
<surname>Alter</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Schadt</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Garbarino</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Prill</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Carlton</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Levy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mason</surname>
<given-names>CE</given-names>
</name>
</person-group>
<year>2015b</year>
<article-title>Modern methods for delineating metagenomic complexity</article-title>
<source>Cell Systems</source>
<volume>1</volume>
<fpage>6</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1016/j.cels.2015.07.007</pub-id>
<pub-id pub-id-type="pmid">27135684</pub-id>
</element-citation>
</ref>
<ref id="ref-4">
<label>Bragg & Robertson (1989)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bragg</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Robertson</surname>
<given-names>DL</given-names>
</name>
</person-group>
<year>1989</year>
<article-title>Nucleotide sequence and analysis of the lethal factor gene (lef) from
<italic>Bacillus anthracis</italic>
</article-title>
<source>Gene</source>
<volume>81</volume>
<fpage>45</fpage>
<lpage>54</lpage>
<pub-id pub-id-type="doi">10.1016/0378-1119(89)90335-1</pub-id>
<pub-id pub-id-type="pmid">2509294</pub-id>
</element-citation>
</ref>
<ref id="ref-5">
<label>Breitwieser & Salzberg (2018)</label>
<element-citation publication-type="working-paper">
<person-group person-group-type="author">
<name>
<surname>Breitwieser</surname>
<given-names>FP</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<year>2018</year>
<article-title>KrakenHLL: confident and fast metagenomics classification using unique k-mer counts</article-title>
<pub-id pub-id-type="arxiv">262956</pub-id>
<pub-id pub-id-type="doi">10.1101/262956</pub-id>
</element-citation>
</ref>
<ref id="ref-6">
<label>Brown et al. (2016)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brown</surname>
<given-names>CT</given-names>
</name>
<name>
<surname>Olm</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>BC</given-names>
</name>
<name>
<surname>Banfield</surname>
<given-names>JF</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Measurement of bacterial replication rates in microbial communities</article-title>
<source>Nature Biotechnology</source>
<volume>34</volume>
<fpage>1256</fpage>
<lpage>1263</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.3704</pub-id>
</element-citation>
</ref>
<ref id="ref-7">
<label>Cachat et al. (2008)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cachat</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Barker</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Read</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Priest</surname>
<given-names>FG</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>A
<italic>Bacillus thuringiensis</italic>
strain producing a polyglutamate capsule resembling that of
<italic>Bacillus anthracis</italic>
</article-title>
<source>FEMS Microbiology Letters</source>
<volume>285</volume>
<fpage>220</fpage>
<lpage>226</lpage>
<pub-id pub-id-type="doi">10.1111/j.1574-6968.2008.01231.x</pub-id>
<pub-id pub-id-type="pmid">18549401</pub-id>
</element-citation>
</ref>
<ref id="ref-8">
<label>Camacho et al. (2009)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Camacho</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Coulouris</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Avagyan</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Papadopoulos</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bealer</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>TL</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>BLAST+: architecture and applications</article-title>
<source>BMC Bioinformatics</source>
<volume>10</volume>
<fpage>421</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-10-421</pub-id>
<pub-id pub-id-type="pmid">20003500</pub-id>
</element-citation>
</ref>
<ref id="ref-9">
<label>Carlson et al. (2018)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carlson</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>Getz</surname>
<given-names>WM</given-names>
</name>
<name>
<surname>Kausrud</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Cizauskas</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Blackburn</surname>
<given-names>JK</given-names>
</name>
<name>
<surname>Bustos Carrillo</surname>
<given-names>FA</given-names>
</name>
<name>
<surname>Colwell</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Easterday</surname>
<given-names>WR</given-names>
</name>
<name>
<surname>Ganz</surname>
<given-names>HH</given-names>
</name>
<name>
<surname>Kamath</surname>
<given-names>PL</given-names>
</name>
<name>
<surname>Økstad</surname>
<given-names>OA</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>WC</given-names>
</name>
<name>
<surname>Kolstø</surname>
<given-names>A-B</given-names>
</name>
<name>
<surname>Stenseth</surname>
<given-names>NC</given-names>
</name>
</person-group>
<year>2018</year>
<article-title>Spores and soil from six sides: interdisciplinarity and the environmental biology of anthrax (
<italic>Bacillus anthracis</italic>
): the environmental biology of
<italic>Bacillus anthracis</italic>
</article-title>
<source>Biological Reviews</source>
<comment>Epub ahead of print May 6 2018</comment>
<pub-id pub-id-type="doi">10.1111/brv.12420</pub-id>
</element-citation>
</ref>
<ref id="ref-10">
<label>Dixon et al. (1999)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dixon</surname>
<given-names>TC</given-names>
</name>
<name>
<surname>Meselson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Guillemin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hanna</surname>
<given-names>PC</given-names>
</name>
</person-group>
<year>1999</year>
<article-title>Anthrax</article-title>
<source>The New England Journal of Medicine</source>
<volume>341</volume>
<fpage>815</fpage>
<lpage>826</lpage>
<pub-id pub-id-type="doi">10.1056/NEJM199909093411107</pub-id>
<pub-id pub-id-type="pmid">10477781</pub-id>
</element-citation>
</ref>
<ref id="ref-11">
<label>Gordon & Hannon (2010)</label>
<element-citation publication-type="software">
<person-group person-group-type="author">
<name>
<surname>Gordon</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hannon</surname>
<given-names>GJ</given-names>
</name>
</person-group>
<year>2010</year>
<data-title>Fastx-toolkit</data-title>
<comment>Computer program distributed by the author.
<uri xlink:href="http://hannonlab.cshl.edu/fastx_toolkit/index.html">http://hannonlab.cshl.edu/fastx_toolkit/index.html</uri>
</comment>
<date-in-citation content-type="access-date" iso-8601-date="2014--2015-00-00">2014–2015</date-in-citation>
</element-citation>
</ref>
<ref id="ref-12">
<label>Helgason et al. (2000)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Helgason</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Okstad</surname>
<given-names>OA</given-names>
</name>
<name>
<surname>Caugant</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Johansen</surname>
<given-names>HA</given-names>
</name>
<name>
<surname>Fouet</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mock</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hegna</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Kolstø</surname>
<given-names>AB</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>
<italic>Bacillus anthracis</italic>
,
<italic>Bacillus cereus</italic>
, and
<italic>Bacillus thuringiensis</italic>
—one species on the basis of genetic evidence</article-title>
<source>Applied and Environmental Microbiology</source>
<volume>66</volume>
<fpage>2627</fpage>
<lpage>2630</lpage>
<pub-id pub-id-type="doi">10.1128/AEM.66.6.2627-2630.2000</pub-id>
<pub-id pub-id-type="pmid">10831447</pub-id>
</element-citation>
</ref>
<ref id="ref-13">
<label>Hoffmann et al. (2017)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoffmann</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Zimmermann</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Biek</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Kuehl</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Nowak</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Mundry</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Agbor</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Angedakin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Arandjelovic</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Blankenburg</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Brazolla</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Corogenes</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Couacy-Hymann</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Deschner</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Dieguez</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Dierks</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Düx</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Dupke</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Eshuis</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Formenty</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Yuh</surname>
<given-names>YG</given-names>
</name>
<name>
<surname>Goedmakers</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gogarten</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Granjon</surname>
<given-names>A-C</given-names>
</name>
<name>
<surname>McGraw</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Grunow</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hart</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Junker</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kiang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Langergraber</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Lapuente</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Leendertz</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Léguillon</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Leinert</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Löhrich</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Marrocoli</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mätz-Rensing</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Meier</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Merkel</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Metzger</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Murai</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Niedorf</surname>
<given-names>S</given-names>
</name>
<name>
<surname>De Nys</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Sachse</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Van Schijndel</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Thiesen</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Ton</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wieler</surname>
<given-names>LH</given-names>
</name>
<name>
<surname>Boesch</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Klee</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Wittig</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Calvignac-Spencer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Leendertz</surname>
<given-names>FH</given-names>
</name>
</person-group>
<year>2017</year>
<article-title>Persistent anthrax as a major driver of wildlife mortality in a tropical rainforest</article-title>
<source>Nature</source>
<volume>548</volume>
<fpage>82</fpage>
<lpage>86</lpage>
<pub-id pub-id-type="doi">10.1038/nature23309</pub-id>
<pub-id pub-id-type="pmid">28770842</pub-id>
</element-citation>
</ref>
<ref id="ref-14">
<label>Hoffmaster et al. (2004)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoffmaster</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Ravel</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rasko</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Chapman</surname>
<given-names>GD</given-names>
</name>
<name>
<surname>Chute</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Marston</surname>
<given-names>CK</given-names>
</name>
<name>
<surname>De</surname>
<given-names>BK</given-names>
</name>
<name>
<surname>Sacchi</surname>
<given-names>CT</given-names>
</name>
<name>
<surname>Fitzgerald</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Mayer</surname>
<given-names>LW</given-names>
</name>
<name>
<surname>Maiden</surname>
<given-names>MCJ</given-names>
</name>
<name>
<surname>Priest</surname>
<given-names>FG</given-names>
</name>
<name>
<surname>Barker</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Cer</surname>
<given-names>RZ</given-names>
</name>
<name>
<surname>Rilstone</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Peterson</surname>
<given-names>SN</given-names>
</name>
<name>
<surname>Weyant</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Galloway</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Read</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Popovic</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Fraser-Liggett</surname>
<given-names>CM</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Identification of anthrax toxin genes in a
<italic>Bacillus cereus</italic>
associated with an illness resembling inhalation anthrax</article-title>
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<volume>101</volume>
<fpage>8449</fpage>
<lpage>8454</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0402414101</pub-id>
<pub-id pub-id-type="pmid">15155910</pub-id>
</element-citation>
</ref>
<ref id="ref-15">
<label>Huang et al. (2012)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Marth</surname>
<given-names>GT</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>ART: a next-generation sequencing read simulator</article-title>
<source>Bioinformatics</source>
<volume>28</volume>
<fpage>593</fpage>
<lpage>594</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr708</pub-id>
<pub-id pub-id-type="pmid">22199392</pub-id>
</element-citation>
</ref>
<ref id="ref-16">
<label>Keim & Wagner (2009)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Keim</surname>
<given-names>PS</given-names>
</name>
<name>
<surname>Wagner</surname>
<given-names>DM</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Humans and evolutionary and ecological forces shaped the phylogeography of recently emerged diseases</article-title>
<source>Nature Reviews. Microbiology</source>
<volume>7</volume>
<fpage>813</fpage>
<lpage>821</lpage>
<pub-id pub-id-type="doi">10.1038/nrmicro2219</pub-id>
<pub-id pub-id-type="pmid">19820723</pub-id>
</element-citation>
</ref>
<ref id="ref-17">
<label>Klee et al. (2010)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Klee</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Brzuszkiewicz</surname>
<given-names>EB</given-names>
</name>
<name>
<surname>Nattermann</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Brüggemann</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Dupke</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wollherr</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Franz</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Pauli</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Appel</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Liebl</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Couacy-Hymann</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Boesch</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>F-D</given-names>
</name>
<name>
<surname>Leendertz</surname>
<given-names>FH</given-names>
</name>
<name>
<surname>Ellerbrok</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Gottschalk</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Grunow</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Liesegang</surname>
<given-names>H</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>The genome of a
<italic>Bacillus</italic>
isolate causing anthrax in chimpanzees combines chromosomal properties of
<italic>B. cereus</italic>
with
<italic>B. anthracis</italic>
virulence plasmids</article-title>
<source>PLOS ONE</source>
<volume>5</volume>
<elocation-id>e10986</elocation-id>
<pub-id pub-id-type="doi">10.1371/journal.pone.0010986</pub-id>
<pub-id pub-id-type="pmid">20634886</pub-id>
</element-citation>
</ref>
<ref id="ref-18">
<label>Konstantinidis & Tiedje (2005)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Konstantinidis</surname>
<given-names>KT</given-names>
</name>
<name>
<surname>Tiedje</surname>
<given-names>JM</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Genomic insights that advance the species definition for prokaryotes</article-title>
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<volume>102</volume>
<fpage>2567</fpage>
<lpage>2572</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0409727102</pub-id>
<pub-id pub-id-type="pmid">15701695</pub-id>
</element-citation>
</ref>
<ref id="ref-19">
<label>Koslicki & Falush (2016)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koslicki</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Falush</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>MetaPalette: a k-mer painting approach for metagenomic taxonomic profiling and quantification of novel strain variation</article-title>
<source>mSystems</source>
<volume>1</volume>
<issue>3</issue>
<elocation-id>e00020-16</elocation-id>
<pub-id pub-id-type="doi">10.1128/mSystems.00020-16</pub-id>
<pub-id pub-id-type="pmid">27822531</pub-id>
</element-citation>
</ref>
<ref id="ref-20">
<label>Li & Durbin (2009)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Fast and accurate short read alignment with Burrows-Wheeler transform</article-title>
<source>Bioinformatics</source>
<volume>25</volume>
<fpage>1754</fpage>
<lpage>1760</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp324</pub-id>
<pub-id pub-id-type="pmid">19451168</pub-id>
</element-citation>
</ref>
<ref id="ref-21">
<label>Li et al. (2009)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Handsaker</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wysoker</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Fennell</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Ruan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Homer</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Marth</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Abecasis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
<collab>1000 Genome Project Data Processing Subgroup</collab>
</person-group>
<year>2009</year>
<article-title>The sequence alignment/map format and SAMtools</article-title>
<source>Bioinformatics</source>
<volume>25</volume>
<fpage>2078</fpage>
<lpage>2079</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp352</pub-id>
<pub-id pub-id-type="pmid">19505943</pub-id>
</element-citation>
</ref>
<ref id="ref-22">
<label>Marçais & Kingsford (2011)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marçais</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Kingsford</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>A fast, lock-free approach for efficient parallel counting of occurrences of k-mers</article-title>
<source>Bioinformatics</source>
<volume>27</volume>
<fpage>764</fpage>
<lpage>770</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr011</pub-id>
<pub-id pub-id-type="pmid">21217122</pub-id>
</element-citation>
</ref>
<ref id="ref-23">
<label>Mason (2015)</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Mason</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>The long road from data to wisdom, and from DNA to pathogen</article-title>
<uri xlink:href="https://www.microbe.net/2015/02/17/the-long-road-from-data-to-wisdom-and-from-dna-to-pathogen/">https://www.microbe.net/2015/02/17/the-long-road-from-data-to-wisdom-and-from-dna-to-pathogen/</uri>
<date-in-citation content-type="access-date" iso-8601-date="2017-12-18">18 December 2017</date-in-citation>
</element-citation>
</ref>
<ref id="ref-24">
<label>McIntyre et al. (2017)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McIntyre</surname>
<given-names>ABR</given-names>
</name>
<name>
<surname>Ounit</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Afshinnekoo</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Prill</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Hénaff</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Minot</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Danko</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Foox</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ahsanuddin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tighe</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hasan</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Subramanian</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Moffat</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Levy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lonardi</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Greenfield</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Colwell</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Rosen</surname>
<given-names>GL</given-names>
</name>
<name>
<surname>Mason</surname>
<given-names>CE</given-names>
</name>
</person-group>
<year>2017</year>
<article-title>Comprehensive benchmarking and ensemble approaches for metagenomic classifiers</article-title>
<source>Genome Biology</source>
<volume>18</volume>
<fpage>182</fpage>
<pub-id pub-id-type="doi">10.1186/s13059-017-1299-7</pub-id>
<pub-id pub-id-type="pmid">28934964</pub-id>
</element-citation>
</ref>
<ref id="ref-25">
<label>Minot et al. (2015)</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Minot</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Greenfield</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Afshinnekoo</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Mason</surname>
<given-names>CE</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Anthrax marker panel</article-title>
<uri xlink:href="https://science.onecodex.com/bacillus-anthracis-panel/">https://science.onecodex.com/bacillus-anthracis-panel/</uri>
<date-in-citation content-type="access-date" iso-8601-date="2017-12-19">19 December 2017</date-in-citation>
</element-citation>
</ref>
<ref id="ref-26">
<label>Nasko et al. (2018)</label>
<element-citation publication-type="working-paper">
<person-group person-group-type="author">
<name>
<surname>Nasko</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Koren</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Phillippy</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Treangen</surname>
<given-names>TJ</given-names>
</name>
</person-group>
<year>2018</year>
<article-title>RefSeq database growth influences the accuracy of k-mer-based species identification</article-title>
<pub-id pub-id-type="arxiv">304972</pub-id>
<pub-id pub-id-type="doi">10.1101/304972</pub-id>
</element-citation>
</ref>
<ref id="ref-27">
<label>Ondov et al. (2016)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ondov</surname>
<given-names>BD</given-names>
</name>
<name>
<surname>Treangen</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Melsted</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Mallonee</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Bergman</surname>
<given-names>NH</given-names>
</name>
<name>
<surname>Koren</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Phillippy</surname>
<given-names>AM</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Mash: fast genome and metagenome distance estimation using MinHash</article-title>
<source>Genome Biology</source>
<volume>17</volume>
<fpage>132</fpage>
<pub-id pub-id-type="doi">10.1186/s13059-016-0997-x</pub-id>
<pub-id pub-id-type="pmid">27323842</pub-id>
</element-citation>
</ref>
<ref id="ref-28">
<label>Pannucci et al. (2002)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pannucci</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Okinaka</surname>
<given-names>RT</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Sabin</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ticknor</surname>
<given-names>LO</given-names>
</name>
<name>
<surname>Kuske</surname>
<given-names>CR</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>DNA sequence conservation between the
<italic>Bacillus anthracis</italic>
pXO2 plasmid and genomic sequence from closely related bacteria</article-title>
<source>BMC Genomics</source>
<volume>3</volume>
<fpage>34</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-3-34</pub-id>
<pub-id pub-id-type="pmid">12473162</pub-id>
</element-citation>
</ref>
<ref id="ref-29">
<label>Petit III et al. (2015)</label>
<element-citation publication-type="data">
<person-group person-group-type="author">
<name>
<surname>Petit III</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Ezewudo</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Joseph</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Read</surname>
<given-names>TD</given-names>
</name>
</person-group>
<year>2015</year>
<data-title>Searching for anthrax in the New York City subway metagenome</data-title>
<source>Zenodo</source>
<date-in-citation content-type="access-date" iso-8601-date="2017-12-18">18 December 2017</date-in-citation>
<pub-id pub-id-type="doi">10.5281/zenodo.17158</pub-id>
</element-citation>
</ref>
<ref id="ref-30">
<label>Quast et al. (2013)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Quast</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Pruesse</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Yilmaz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Gerken</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Schweer</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Yarza</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Peplies</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Glöckner</surname>
<given-names>FO</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>The SILVA ribosomal RNA gene database project: improved data processing and web-based tools</article-title>
<source>Nucleic Acids Research</source>
<volume>41</volume>
<fpage>D590</fpage>
<lpage>D596</lpage>
<pub-id pub-id-type="pmid">23193283</pub-id>
</element-citation>
</ref>
<ref id="ref-31">
<label>Quinlan & Hall (2010)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Quinlan</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Hall</surname>
<given-names>IM</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>BEDTools: a flexible suite of utilities for comparing genomic features</article-title>
<source>Bioinformatics</source>
<volume>26</volume>
<fpage>841</fpage>
<lpage>842</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq033</pub-id>
<pub-id pub-id-type="pmid">20110278</pub-id>
</element-citation>
</ref>
<ref id="ref-32">
<label>Rasko et al. (2007)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rasko</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Rosovitz</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Økstad</surname>
<given-names>OA</given-names>
</name>
<name>
<surname>Fouts</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Cer</surname>
<given-names>RZ</given-names>
</name>
<name>
<surname>Kolstø</surname>
<given-names>A-B</given-names>
</name>
<name>
<surname>Gill</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Ravel</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Complete sequence analysis of novel plasmids from emetic and periodontal
<italic>Bacillus cereus</italic>
isolates reveals a common evolutionary history among the
<italic>B. cereus</italic>
-group plasmids, including
<italic>Bacillus anthracis</italic>
pXO1</article-title>
<source>Journal of Bacteriology</source>
<volume>189</volume>
<fpage>52</fpage>
<lpage>64</lpage>
<pub-id pub-id-type="doi">10.1128/JB.01313-06</pub-id>
<pub-id pub-id-type="pmid">17041058</pub-id>
</element-citation>
</ref>
<ref id="ref-33">
<label>Read et al. (2002)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Read</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Pop</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Shumway</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Umayam</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Holtzapple</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Busch</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Schupp</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Solomon</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Keim</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Fraser</surname>
<given-names>CM</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Comparative genome sequencing for discovery of novel polymorphisms in
<italic>Bacillus anthracis</italic>
</article-title>
<source>Science</source>
<volume>296</volume>
<fpage>2028</fpage>
<lpage>2033</lpage>
<pub-id pub-id-type="doi">10.1126/science.1071837</pub-id>
<pub-id pub-id-type="pmid">12004073</pub-id>
</element-citation>
</ref>
<ref id="ref-34">
<label>Segata et al. (2012)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Segata</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Waldron</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Ballarini</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Narasimhan</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Jousson</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Huttenhower</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Metagenomic microbial community profiling using unique clade-specific marker genes</article-title>
<source>Nature Methods</source>
<volume>9</volume>
<fpage>811</fpage>
<lpage>814</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.2066</pub-id>
<pub-id pub-id-type="pmid">22688413</pub-id>
</element-citation>
</ref>
<ref id="ref-35">
<label>Zwick et al. (2012)</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zwick</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Joseph</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Didelot</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>PE</given-names>
</name>
<name>
<surname>Bishop-Lilly</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Stewart</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Willner</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Nolan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Lentz</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Thomason</surname>
<given-names>MK</given-names>
</name>
<name>
<surname>Sozhamannan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mateczun</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Du</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Read</surname>
<given-names>TD</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Genomic characterization of the
<italic>Bacillus cereus</italic>
sensu lato species: backdrop to the evolution of
<italic>Bacillus anthracis</italic>
</article-title>
<source>Genome Research</source>
<volume>22</volume>
<fpage>1512</fpage>
<lpage>1524</lpage>
<pub-id pub-id-type="doi">10.1101/gr.134437.111</pub-id>
<pub-id pub-id-type="pmid">22645259</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 0011229 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 0011229 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021