Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 0010010 ( Pmc/Corpus ); précédent : 0010009; suivant : 0010011 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events</title>
<author>
<name sortKey="Jaillard, Magali" sort="Jaillard, Magali" uniqKey="Jaillard M" first="Magali" last="Jaillard">Magali Jaillard</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>bioMérieux, Marcy l’Étoile, France</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lima, Leandro" sort="Lima, Leandro" uniqKey="Lima L" first="Leandro" last="Lima">Leandro Lima</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>EPI ERABLE - Inria Grenoble, Rhône-Alpes, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tournoud, Maud" sort="Tournoud, Maud" uniqKey="Tournoud M" first="Maud" last="Tournoud">Maud Tournoud</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>bioMérieux, Marcy l’Étoile, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mahe, Pierre" sort="Mahe, Pierre" uniqKey="Mahe P" first="Pierre" last="Mahé">Pierre Mahé</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>bioMérieux, Marcy l’Étoile, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Van Belkum, Alex" sort="Van Belkum, Alex" uniqKey="Van Belkum A" first="Alex" last="Van Belkum">Alex Van Belkum</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>bioMérieux, Marcy l’Étoile, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lacroix, Vincent" sort="Lacroix, Vincent" uniqKey="Lacroix V" first="Vincent" last="Lacroix">Vincent Lacroix</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>EPI ERABLE - Inria Grenoble, Rhône-Alpes, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Jacob, Laurent" sort="Jacob, Laurent" uniqKey="Jacob L" first="Laurent" last="Jacob">Laurent Jacob</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">30419019</idno>
<idno type="pmc">6258240</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6258240</idno>
<idno type="RBID">PMC:6258240</idno>
<idno type="doi">10.1371/journal.pgen.1007758</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">001001</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001001</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events</title>
<author>
<name sortKey="Jaillard, Magali" sort="Jaillard, Magali" uniqKey="Jaillard M" first="Magali" last="Jaillard">Magali Jaillard</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>bioMérieux, Marcy l’Étoile, France</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lima, Leandro" sort="Lima, Leandro" uniqKey="Lima L" first="Leandro" last="Lima">Leandro Lima</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>EPI ERABLE - Inria Grenoble, Rhône-Alpes, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tournoud, Maud" sort="Tournoud, Maud" uniqKey="Tournoud M" first="Maud" last="Tournoud">Maud Tournoud</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>bioMérieux, Marcy l’Étoile, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mahe, Pierre" sort="Mahe, Pierre" uniqKey="Mahe P" first="Pierre" last="Mahé">Pierre Mahé</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>bioMérieux, Marcy l’Étoile, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Van Belkum, Alex" sort="Van Belkum, Alex" uniqKey="Van Belkum A" first="Alex" last="Van Belkum">Alex Van Belkum</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>bioMérieux, Marcy l’Étoile, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lacroix, Vincent" sort="Lacroix, Vincent" uniqKey="Lacroix V" first="Vincent" last="Lacroix">Vincent Lacroix</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>EPI ERABLE - Inria Grenoble, Rhône-Alpes, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Jacob, Laurent" sort="Jacob, Laurent" uniqKey="Jacob L" first="Laurent" last="Jacob">Laurent Jacob</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS Genetics</title>
<idno type="ISSN">1553-7390</idno>
<idno type="eISSN">1553-7404</idno>
<imprint>
<date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Genome-wide association study (GWAS) methods applied to bacterial genomes have shown promising results for genetic marker discovery or detailed assessment of marker effect. Recently, alignment-free methods based on k-mer composition have proven their ability to explore the accessory genome. However, they lead to redundant descriptions and results which are sometimes hard to interpret. Here we introduce DBGWAS, an extended k-mer-based GWAS method producing interpretable genetic variants associated with distinct phenotypes. Relying on compacted De Bruijn graphs (cDBG), our method gathers cDBG nodes, identified by the association model, into subgraphs defined from their neighbourhood in the initial cDBG. DBGWAS is alignment-free and only requires a set of contigs and phenotypes. In particular, it does not require prior annotation or reference genomes. It produces subgraphs representing phenotype-associated genetic variants such as local polymorphisms and mobile genetic elements (MGE). It offers a graphical framework which helps interpret GWAS results. Importantly it is also computationally efficient—experiments took one hour and a half on average. We validated our method using antibiotic resistance phenotypes for three bacterial species. DBGWAS recovered known resistance determinants such as mutations in core genes in
<italic>Mycobacterium tuberculosis</italic>
, and genes acquired by horizontal transfer in
<italic>Staphylococcus aureus</italic>
and
<italic>Pseudomonas aeruginosa</italic>
—along with their MGE context. It also enabled us to formulate new hypotheses involving genetic variants not yet described in the antibiotic resistance literature. An open-source tool implementing DBGWAS is available at
<ext-link ext-link-type="uri" xlink:href="https://gitlab.com/leoisl/dbgwas">https://gitlab.com/leoisl/dbgwas</ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Farhat, Mr" uniqKey="Farhat M">MR Farhat</name>
</author>
<author>
<name sortKey="Shapiro, Bj" uniqKey="Shapiro B">BJ Shapiro</name>
</author>
<author>
<name sortKey="Kieser, Kj" uniqKey="Kieser K">KJ Kieser</name>
</author>
<author>
<name sortKey="Sultana, R" uniqKey="Sultana R">R Sultana</name>
</author>
<author>
<name sortKey="Jacobson, Kr" uniqKey="Jacobson K">KR Jacobson</name>
</author>
<author>
<name sortKey="Victor, Tc" uniqKey="Victor T">TC Victor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sheppard, Sk" uniqKey="Sheppard S">SK Sheppard</name>
</author>
<author>
<name sortKey="Didelot, X" uniqKey="Didelot X">X Didelot</name>
</author>
<author>
<name sortKey="Meric, G" uniqKey="Meric G">G Meric</name>
</author>
<author>
<name sortKey="Torralbo, A" uniqKey="Torralbo A">A Torralbo</name>
</author>
<author>
<name sortKey="Jolley, Ka" uniqKey="Jolley K">KA Jolley</name>
</author>
<author>
<name sortKey="Kelly, Dj" uniqKey="Kelly D">DJ Kelly</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Alam, Mt" uniqKey="Alam M">MT Alam</name>
</author>
<author>
<name sortKey="Petit, Ra" uniqKey="Petit R">RA Petit</name>
</author>
<author>
<name sortKey="Crispell, Ek" uniqKey="Crispell E">EK Crispell</name>
</author>
<author>
<name sortKey="Thornton, Ta" uniqKey="Thornton T">TA Thornton</name>
</author>
<author>
<name sortKey="Conneely, Kn" uniqKey="Conneely K">KN Conneely</name>
</author>
<author>
<name sortKey="Jiang, Y" uniqKey="Jiang Y">Y Jiang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chewapreecha, C" uniqKey="Chewapreecha C">C Chewapreecha</name>
</author>
<author>
<name sortKey="Marttinen, P" uniqKey="Marttinen P">P Marttinen</name>
</author>
<author>
<name sortKey="Croucher, Nj" uniqKey="Croucher N">NJ Croucher</name>
</author>
<author>
<name sortKey="Salter, Sj" uniqKey="Salter S">SJ Salter</name>
</author>
<author>
<name sortKey="Harris, Sr" uniqKey="Harris S">SR Harris</name>
</author>
<author>
<name sortKey="Mather, Ae" uniqKey="Mather A">AE Mather</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Earle, Sg" uniqKey="Earle S">SG Earle</name>
</author>
<author>
<name sortKey="Wu, Ch" uniqKey="Wu C">CH Wu</name>
</author>
<author>
<name sortKey="Charlesworth, J" uniqKey="Charlesworth J">J Charlesworth</name>
</author>
<author>
<name sortKey="Stoesser, N" uniqKey="Stoesser N">N Stoesser</name>
</author>
<author>
<name sortKey="Gordon, Nc" uniqKey="Gordon N">NC Gordon</name>
</author>
<author>
<name sortKey="Walker, Tm" uniqKey="Walker T">TM Walker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lees, Ja" uniqKey="Lees J">JA Lees</name>
</author>
<author>
<name sortKey="Vehkala, M" uniqKey="Vehkala M">M Vehkala</name>
</author>
<author>
<name sortKey="V Lim Ki, N" uniqKey="V Lim Ki N">N Välimäki</name>
</author>
<author>
<name sortKey="Harris, Sr" uniqKey="Harris S">SR Harris</name>
</author>
<author>
<name sortKey="Chewapreecha, C" uniqKey="Chewapreecha C">C Chewapreecha</name>
</author>
<author>
<name sortKey="Croucher, Nj" uniqKey="Croucher N">NJ Croucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jaillard, M" uniqKey="Jaillard M">M Jaillard</name>
</author>
<author>
<name sortKey="Van Belkum, A" uniqKey="Van Belkum A">A van Belkum</name>
</author>
<author>
<name sortKey="Cady, Kc" uniqKey="Cady K">KC Cady</name>
</author>
<author>
<name sortKey="Creely, D" uniqKey="Creely D">D Creely</name>
</author>
<author>
<name sortKey="Shortridge, D" uniqKey="Shortridge D">D Shortridge</name>
</author>
<author>
<name sortKey="Blanc, B" uniqKey="Blanc B">B Blanc</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Page, Aj" uniqKey="Page A">AJ Page</name>
</author>
<author>
<name sortKey="Cummins, Ca" uniqKey="Cummins C">CA Cummins</name>
</author>
<author>
<name sortKey="Hunt, M" uniqKey="Hunt M">M Hunt</name>
</author>
<author>
<name sortKey="Wong, Vk" uniqKey="Wong V">VK Wong</name>
</author>
<author>
<name sortKey="Reuter, S" uniqKey="Reuter S">S Reuter</name>
</author>
<author>
<name sortKey="Holden, Mt" uniqKey="Holden M">MT Holden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, H" uniqKey="Zhang H">H Zhang</name>
</author>
<author>
<name sortKey="Li, D" uniqKey="Li D">D Li</name>
</author>
<author>
<name sortKey="Zhao, L" uniqKey="Zhao L">L Zhao</name>
</author>
<author>
<name sortKey="Fleming, J" uniqKey="Fleming J">J Fleming</name>
</author>
<author>
<name sortKey="Lin, N" uniqKey="Lin N">N Lin</name>
</author>
<author>
<name sortKey="Wang, T" uniqKey="Wang T">T Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blair, Jm" uniqKey="Blair J">JM Blair</name>
</author>
<author>
<name sortKey="Webber, Ma" uniqKey="Webber M">MA Webber</name>
</author>
<author>
<name sortKey="Baylay, Aj" uniqKey="Baylay A">AJ Baylay</name>
</author>
<author>
<name sortKey="Ogbolu, Do" uniqKey="Ogbolu D">DO Ogbolu</name>
</author>
<author>
<name sortKey="Piddock, Lj" uniqKey="Piddock L">LJ Piddock</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haft, Dh" uniqKey="Haft D">DH Haft</name>
</author>
<author>
<name sortKey="Dicuccio, M" uniqKey="Dicuccio M">M DiCuccio</name>
</author>
<author>
<name sortKey="Badretdin, A" uniqKey="Badretdin A">A Badretdin</name>
</author>
<author>
<name sortKey="Brover, V" uniqKey="Brover V">V Brover</name>
</author>
<author>
<name sortKey="Chetvernin, V" uniqKey="Chetvernin V">V Chetvernin</name>
</author>
<author>
<name sortKey="O Eill, K" uniqKey="O Eill K">K O’Neill</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le Bras, Y" uniqKey="Le Bras Y">Y Le Bras</name>
</author>
<author>
<name sortKey="Collin, O" uniqKey="Collin O">O Collin</name>
</author>
<author>
<name sortKey="Monjeaud, C" uniqKey="Monjeaud C">C Monjeaud</name>
</author>
<author>
<name sortKey="Lacroix, V" uniqKey="Lacroix V">V Lacroix</name>
</author>
<author>
<name sortKey="Rivals, E" uniqKey="Rivals E">É Rivals</name>
</author>
<author>
<name sortKey="Lemaitre, C" uniqKey="Lemaitre C">C Lemaitre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rahman, A" uniqKey="Rahman A">A Rahman</name>
</author>
<author>
<name sortKey="Hallgrimsd Ttir, I" uniqKey="Hallgrimsd Ttir I">I Hallgrímsdóttir</name>
</author>
<author>
<name sortKey="Eisen, M" uniqKey="Eisen M">M Eisen</name>
</author>
<author>
<name sortKey="Pachter, L" uniqKey="Pachter L">L Pachter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Read, Td" uniqKey="Read T">TD Read</name>
</author>
<author>
<name sortKey="Massey, Rc" uniqKey="Massey R">RC Massey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Power, Ra" uniqKey="Power R">RA Power</name>
</author>
<author>
<name sortKey="Parkhill, J" uniqKey="Parkhill J">J Parkhill</name>
</author>
<author>
<name sortKey="De Oliveira, T" uniqKey="De Oliveira T">T de Oliveira</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="De Bruijn, N" uniqKey="De Bruijn N">N de Bruijn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, W" uniqKey="Zhang W">W Zhang</name>
</author>
<author>
<name sortKey="Chen, J" uniqKey="Chen J">J Chen</name>
</author>
<author>
<name sortKey="Yang, Y" uniqKey="Yang Y">Y Yang</name>
</author>
<author>
<name sortKey="Tang, Y" uniqKey="Tang Y">Y Tang</name>
</author>
<author>
<name sortKey="Shang, J" uniqKey="Shang J">J Shang</name>
</author>
<author>
<name sortKey="Shen, B" uniqKey="Shen B">B Shen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Iqbal, Z" uniqKey="Iqbal Z">Z Iqbal</name>
</author>
<author>
<name sortKey="Caccamo, M" uniqKey="Caccamo M">M Caccamo</name>
</author>
<author>
<name sortKey="Turner, I" uniqKey="Turner I">I Turner</name>
</author>
<author>
<name sortKey="Flicek, P" uniqKey="Flicek P">P Flicek</name>
</author>
<author>
<name sortKey="Mcvean, G" uniqKey="Mcvean G">G McVean</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hooper, Dc" uniqKey="Hooper D">DC Hooper</name>
</author>
<author>
<name sortKey="Jacoby, Ga" uniqKey="Jacoby G">GA Jacoby</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lowy, Fd" uniqKey="Lowy F">FD Lowy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Piton, J" uniqKey="Piton J">J Piton</name>
</author>
<author>
<name sortKey="Petrella, S" uniqKey="Petrella S">S Petrella</name>
</author>
<author>
<name sortKey="Delarue, M" uniqKey="Delarue M">M Delarue</name>
</author>
<author>
<name sortKey="Andre Leroux, G" uniqKey="Andre Leroux G">G André-Leroux</name>
</author>
<author>
<name sortKey="Jarlier, V" uniqKey="Jarlier V">V Jarlier</name>
</author>
<author>
<name sortKey="Aubry, A" uniqKey="Aubry A">A Aubry</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lambert, P" uniqKey="Lambert P">P Lambert</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lambert, T" uniqKey="Lambert T">T Lambert</name>
</author>
<author>
<name sortKey="Ploy, M" uniqKey="Ploy M">M Ploy</name>
</author>
<author>
<name sortKey="Courvalin, P" uniqKey="Courvalin P">P Courvalin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, H" uniqKey="Lee H">H Lee</name>
</author>
<author>
<name sortKey="Cho, S" uniqKey="Cho S">S Cho</name>
</author>
<author>
<name sortKey="Bang, H" uniqKey="Bang H">H Bang</name>
</author>
<author>
<name sortKey="Lee, J" uniqKey="Lee J">J Lee</name>
</author>
<author>
<name sortKey="Bai, G" uniqKey="Bai G">G Bai</name>
</author>
<author>
<name sortKey="Kim, S" uniqKey="Kim S">S Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Farhat, Mr" uniqKey="Farhat M">MR Farhat</name>
</author>
<author>
<name sortKey="Sultana, R" uniqKey="Sultana R">R Sultana</name>
</author>
<author>
<name sortKey="Iartchouk, O" uniqKey="Iartchouk O">O Iartchouk</name>
</author>
<author>
<name sortKey="Bozeman, S" uniqKey="Bozeman S">S Bozeman</name>
</author>
<author>
<name sortKey="Galagan, J" uniqKey="Galagan J">J Galagan</name>
</author>
<author>
<name sortKey="Sisk, P" uniqKey="Sisk P">P Sisk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Flandrois, Jp" uniqKey="Flandrois J">JP Flandrois</name>
</author>
<author>
<name sortKey="Lina, G" uniqKey="Lina G">G Lina</name>
</author>
<author>
<name sortKey="Dumitrescu, O" uniqKey="Dumitrescu O">O Dumitrescu</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gordon, N" uniqKey="Gordon N">N Gordon</name>
</author>
<author>
<name sortKey="Price, J" uniqKey="Price J">J Price</name>
</author>
<author>
<name sortKey="Cole, K" uniqKey="Cole K">K Cole</name>
</author>
<author>
<name sortKey="Everitt, R" uniqKey="Everitt R">R Everitt</name>
</author>
<author>
<name sortKey="Morgan, M" uniqKey="Morgan M">M Morgan</name>
</author>
<author>
<name sortKey="Finney, J" uniqKey="Finney J">J Finney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Westh, H" uniqKey="Westh H">H Westh</name>
</author>
<author>
<name sortKey="Hougaard, D" uniqKey="Hougaard D">D Hougaard</name>
</author>
<author>
<name sortKey="Vuust, J" uniqKey="Vuust J">J Vuust</name>
</author>
<author>
<name sortKey="Rosdahl, V" uniqKey="Rosdahl V">V Rosdahl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benson, Da" uniqKey="Benson D">DA Benson</name>
</author>
<author>
<name sortKey="Cavanaugh, M" uniqKey="Cavanaugh M">M Cavanaugh</name>
</author>
<author>
<name sortKey="Clark, K" uniqKey="Clark K">K Clark</name>
</author>
<author>
<name sortKey="Karsch Mizrachi, I" uniqKey="Karsch Mizrachi I">I Karsch-Mizrachi</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
<author>
<name sortKey="Ostell, J" uniqKey="Ostell J">J Ostell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bi, D" uniqKey="Bi D">D Bi</name>
</author>
<author>
<name sortKey="Xie, Y" uniqKey="Xie Y">Y Xie</name>
</author>
<author>
<name sortKey="Tai, C" uniqKey="Tai C">C Tai</name>
</author>
<author>
<name sortKey="Jiang, X" uniqKey="Jiang X">X Jiang</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author>
<name sortKey="Harrison, Em" uniqKey="Harrison E">EM Harrison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Palomino, Jc" uniqKey="Palomino J">JC Palomino</name>
</author>
<author>
<name sortKey="Martin, A" uniqKey="Martin A">A Martin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Davis, Jj" uniqKey="Davis J">JJ Davis</name>
</author>
<author>
<name sortKey="Boisvert, S" uniqKey="Boisvert S">S Boisvert</name>
</author>
<author>
<name sortKey="Brettin, T" uniqKey="Brettin T">T Brettin</name>
</author>
<author>
<name sortKey="Kenyon, Rw" uniqKey="Kenyon R">RW Kenyon</name>
</author>
<author>
<name sortKey="Mao, C" uniqKey="Mao C">C Mao</name>
</author>
<author>
<name sortKey="Olson, R" uniqKey="Olson R">R Olson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lees, J" uniqKey="Lees J">J Lees</name>
</author>
<author>
<name sortKey="Galardini, M" uniqKey="Galardini M">M Galardini</name>
</author>
<author>
<name sortKey="Bentley, Sd" uniqKey="Bentley S">SD Bentley</name>
</author>
<author>
<name sortKey="Weiser, Jn" uniqKey="Weiser J">JN Weiser</name>
</author>
<author>
<name sortKey="Corander, J" uniqKey="Corander J">J Corander</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Traore, H" uniqKey="Traore H">H Traore</name>
</author>
<author>
<name sortKey="Fissette, K" uniqKey="Fissette K">K Fissette</name>
</author>
<author>
<name sortKey="Bastian, I" uniqKey="Bastian I">I Bastian</name>
</author>
<author>
<name sortKey="Devleeschouwer, M" uniqKey="Devleeschouwer M">M Devleeschouwer</name>
</author>
<author>
<name sortKey="Portaels, F" uniqKey="Portaels F">F Portaels</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Illakkiam, D" uniqKey="Illakkiam D">D Illakkiam</name>
</author>
<author>
<name sortKey="Shankar, M" uniqKey="Shankar M">M Shankar</name>
</author>
<author>
<name sortKey="Ponraj, P" uniqKey="Ponraj P">P Ponraj</name>
</author>
<author>
<name sortKey="Rajendhran, J" uniqKey="Rajendhran J">J Rajendhran</name>
</author>
<author>
<name sortKey="Gunasekaran, P" uniqKey="Gunasekaran P">P Gunasekaran</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ali Ahmad, A" uniqKey="Ali Ahmad A">A Ali-Ahmad</name>
</author>
<author>
<name sortKey="Fadel, F" uniqKey="Fadel F">F Fadel</name>
</author>
<author>
<name sortKey="Sebban Kreuzer, C" uniqKey="Sebban Kreuzer C">C Sebban-Kreuzer</name>
</author>
<author>
<name sortKey="Ba, M" uniqKey="Ba M">M Ba</name>
</author>
<author>
<name sortKey="Pelissier, Gd" uniqKey="Pelissier G">GD Pélissier</name>
</author>
<author>
<name sortKey="Bornet, O" uniqKey="Bornet O">O Bornet</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marschall, T" uniqKey="Marschall T">T Marschall</name>
</author>
<author>
<name sortKey="Marz, M" uniqKey="Marz M">M Marz</name>
</author>
<author>
<name sortKey="Abeel, T" uniqKey="Abeel T">T Abeel</name>
</author>
<author>
<name sortKey="Dijkstra, L" uniqKey="Dijkstra L">L Dijkstra</name>
</author>
<author>
<name sortKey="Dutilh, Be" uniqKey="Dutilh B">BE Dutilh</name>
</author>
<author>
<name sortKey="Ghaffaari, A" uniqKey="Ghaffaari A">A Ghaffaari</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paten, B" uniqKey="Paten B">B Paten</name>
</author>
<author>
<name sortKey="Novak, Am" uniqKey="Novak A">AM Novak</name>
</author>
<author>
<name sortKey="Eizenga, Jm" uniqKey="Eizenga J">JM Eizenga</name>
</author>
<author>
<name sortKey="Garrison, E" uniqKey="Garrison E">E Garrison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baaijens, Ja" uniqKey="Baaijens J">JA Baaijens</name>
</author>
<author>
<name sortKey="El Aabidine, Az" uniqKey="El Aabidine A">AZ El Aabidine</name>
</author>
<author>
<name sortKey="Rivals, E" uniqKey="Rivals E">E Rivals</name>
</author>
<author>
<name sortKey="Schonhuth, A" uniqKey="Schonhuth A">A Schönhuth</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dunne, Wm" uniqKey="Dunne W">WM Dunne</name>
</author>
<author>
<name sortKey="Jaillard, M" uniqKey="Jaillard M">M Jaillard</name>
</author>
<author>
<name sortKey="Rochas, O" uniqKey="Rochas O">O Rochas</name>
</author>
<author>
<name sortKey="Van Belkum, A" uniqKey="Van Belkum A">A Van Belkum</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kos, Vn" uniqKey="Kos V">VN Kos</name>
</author>
<author>
<name sortKey="Deraspe, M" uniqKey="Deraspe M">M Déraspe</name>
</author>
<author>
<name sortKey="Mclaughlin, Re" uniqKey="Mclaughlin R">RE McLaughlin</name>
</author>
<author>
<name sortKey="Whiteaker, Jd" uniqKey="Whiteaker J">JD Whiteaker</name>
</author>
<author>
<name sortKey="Roy, Ph" uniqKey="Roy P">PH Roy</name>
</author>
<author>
<name sortKey="Alm, Ra" uniqKey="Alm R">RA Alm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bradley, P" uniqKey="Bradley P">P Bradley</name>
</author>
<author>
<name sortKey="Gordon, Nc" uniqKey="Gordon N">NC Gordon</name>
</author>
<author>
<name sortKey="Walker, Tm" uniqKey="Walker T">TM Walker</name>
</author>
<author>
<name sortKey="Dunn, L" uniqKey="Dunn L">L Dunn</name>
</author>
<author>
<name sortKey="Heys, S" uniqKey="Heys S">S Heys</name>
</author>
<author>
<name sortKey="Huang, B" uniqKey="Huang B">B Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moradigaravand, D" uniqKey="Moradigaravand D">D Moradigaravand</name>
</author>
<author>
<name sortKey="Palm, M" uniqKey="Palm M">M Palm</name>
</author>
<author>
<name sortKey="Farewell, A" uniqKey="Farewell A">A Farewell</name>
</author>
<author>
<name sortKey="Mustonen, V" uniqKey="Mustonen V">V Mustonen</name>
</author>
<author>
<name sortKey="Warringer, J" uniqKey="Warringer J">J Warringer</name>
</author>
<author>
<name sortKey="Parts, L" uniqKey="Parts L">L Parts</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Butler, J" uniqKey="Butler J">J Butler</name>
</author>
<author>
<name sortKey="Maccallum, I" uniqKey="Maccallum I">I MacCallum</name>
</author>
<author>
<name sortKey="Kleber, M" uniqKey="Kleber M">M Kleber</name>
</author>
<author>
<name sortKey="Shlyakhter, Ia" uniqKey="Shlyakhter I">IA Shlyakhter</name>
</author>
<author>
<name sortKey="Belmonte, Mk" uniqKey="Belmonte M">MK Belmonte</name>
</author>
<author>
<name sortKey="Lander, Es" uniqKey="Lander E">ES Lander</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, D" uniqKey="Zerbino D">D Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R Chikhi</name>
</author>
<author>
<name sortKey="Limasset, A" uniqKey="Limasset A">A Limasset</name>
</author>
<author>
<name sortKey="Medvedev, P" uniqKey="Medvedev P">P Medvedev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Drezen, E" uniqKey="Drezen E">E Drezen</name>
</author>
<author>
<name sortKey="Rizk, G" uniqKey="Rizk G">G Rizk</name>
</author>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R Chikhi</name>
</author>
<author>
<name sortKey="Deltel, C" uniqKey="Deltel C">C Deltel</name>
</author>
<author>
<name sortKey="Lemaitre, C" uniqKey="Lemaitre C">C Lemaitre</name>
</author>
<author>
<name sortKey="Peterlongo, P" uniqKey="Peterlongo P">P Peterlongo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Limasset, A" uniqKey="Limasset A">A Limasset</name>
</author>
<author>
<name sortKey="Rizk, G" uniqKey="Rizk G">G Rizk</name>
</author>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R Chikhi</name>
</author>
<author>
<name sortKey="Peterlongo, P" uniqKey="Peterlongo P">P Peterlongo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Balding, Dj" uniqKey="Balding D">DJ Balding</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, X" uniqKey="Zhou X">X Zhou</name>
</author>
<author>
<name sortKey="Stephens, M" uniqKey="Stephens M">M Stephens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Widmer, C" uniqKey="Widmer C">C Widmer</name>
</author>
<author>
<name sortKey="Lippert, C" uniqKey="Lippert C">C Lippert</name>
</author>
<author>
<name sortKey="Weissbrod, O" uniqKey="Weissbrod O">O Weissbrod</name>
</author>
<author>
<name sortKey="Fusi, N" uniqKey="Fusi N">N Fusi</name>
</author>
<author>
<name sortKey="Kadie, C" uniqKey="Kadie C">C Kadie</name>
</author>
<author>
<name sortKey="Davidson, R" uniqKey="Davidson R">R Davidson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Falush, D" uniqKey="Falush D">D Falush</name>
</author>
<author>
<name sortKey="Bowden, R" uniqKey="Bowden R">R Bowden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Collins, C" uniqKey="Collins C">C Collins</name>
</author>
<author>
<name sortKey="Didelot, X" uniqKey="Didelot X">X Didelot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, X" uniqKey="Zhou X">X Zhou</name>
</author>
<author>
<name sortKey="Stephens, M" uniqKey="Stephens M">M Stephens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benjamini, Y" uniqKey="Benjamini Y">Y Benjamini</name>
</author>
<author>
<name sortKey="Hochberg, Y" uniqKey="Hochberg Y">Y Hochberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Camacho, C" uniqKey="Camacho C">C Camacho</name>
</author>
<author>
<name sortKey="Coulouris, G" uniqKey="Coulouris G">G Coulouris</name>
</author>
<author>
<name sortKey="Avagyan, V" uniqKey="Avagyan V">V Avagyan</name>
</author>
<author>
<name sortKey="Ma, N" uniqKey="Ma N">N Ma</name>
</author>
<author>
<name sortKey="Papadopoulos, J" uniqKey="Papadopoulos J">J Papadopoulos</name>
</author>
<author>
<name sortKey="Bealer, K" uniqKey="Bealer K">K Bealer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zankari, E" uniqKey="Zankari E">E Zankari</name>
</author>
<author>
<name sortKey="Hasman, H" uniqKey="Hasman H">H Hasman</name>
</author>
<author>
<name sortKey="Cosentino, S" uniqKey="Cosentino S">S Cosentino</name>
</author>
<author>
<name sortKey="Vestergaard, M" uniqKey="Vestergaard M">M Vestergaard</name>
</author>
<author>
<name sortKey="Rasmussen, S" uniqKey="Rasmussen S">S Rasmussen</name>
</author>
<author>
<name sortKey="Lund, O" uniqKey="Lund O">O Lund</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lakin, Sm" uniqKey="Lakin S">SM Lakin</name>
</author>
<author>
<name sortKey="Dean, C" uniqKey="Dean C">C Dean</name>
</author>
<author>
<name sortKey="Noyes, Nr" uniqKey="Noyes N">NR Noyes</name>
</author>
<author>
<name sortKey="Dettenwanger, A" uniqKey="Dettenwanger A">A Dettenwanger</name>
</author>
<author>
<name sortKey="Ross, As" uniqKey="Ross A">AS Ross</name>
</author>
<author>
<name sortKey="Doster, E" uniqKey="Doster E">E Doster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gupta, Sk" uniqKey="Gupta S">SK Gupta</name>
</author>
<author>
<name sortKey="Padmanabhan, Br" uniqKey="Padmanabhan B">BR Padmanabhan</name>
</author>
<author>
<name sortKey="Diene, Sm" uniqKey="Diene S">SM Diene</name>
</author>
<author>
<name sortKey="Lopez Rojas, R" uniqKey="Lopez Rojas R">R Lopez-Rojas</name>
</author>
<author>
<name sortKey="Kempf, M" uniqKey="Kempf M">M Kempf</name>
</author>
<author>
<name sortKey="Landraud, L" uniqKey="Landraud L">L Landraud</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Franz, M" uniqKey="Franz M">M Franz</name>
</author>
<author>
<name sortKey="Lopes, Ct" uniqKey="Lopes C">CT Lopes</name>
</author>
<author>
<name sortKey="Huck, G" uniqKey="Huck G">G Huck</name>
</author>
<author>
<name sortKey="Dong, Y" uniqKey="Dong Y">Y Dong</name>
</author>
<author>
<name sortKey="Sumer, O" uniqKey="Sumer O">O Sumer</name>
</author>
<author>
<name sortKey="Bader, Gd" uniqKey="Bader G">GD Bader</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Belkum, A" uniqKey="Van Belkum A">A van Belkum</name>
</author>
<author>
<name sortKey="Soriaga, Lb" uniqKey="Soriaga L">LB Soriaga</name>
</author>
<author>
<name sortKey="Lafave, Mc" uniqKey="Lafave M">MC LaFave</name>
</author>
<author>
<name sortKey="Akella, S" uniqKey="Akella S">S Akella</name>
</author>
<author>
<name sortKey="Veyrieras, Jb" uniqKey="Veyrieras J">JB Veyrieras</name>
</author>
<author>
<name sortKey="Barbu, Em" uniqKey="Barbu E">EM Barbu</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gygli, Sm" uniqKey="Gygli S">SM Gygli</name>
</author>
<author>
<name sortKey="Borrell, S" uniqKey="Borrell S">S Borrell</name>
</author>
<author>
<name sortKey="Trauner, A" uniqKey="Trauner A">A Trauner</name>
</author>
<author>
<name sortKey="Gagneux, S" uniqKey="Gagneux S">S Gagneux</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wattam, Ar" uniqKey="Wattam A">AR Wattam</name>
</author>
<author>
<name sortKey="Davis, Jj" uniqKey="Davis J">JJ Davis</name>
</author>
<author>
<name sortKey="Assaf, R" uniqKey="Assaf R">R Assaf</name>
</author>
<author>
<name sortKey="Boisvert, S" uniqKey="Boisvert S">S Boisvert</name>
</author>
<author>
<name sortKey="Brettin, T" uniqKey="Brettin T">T Brettin</name>
</author>
<author>
<name sortKey="Bun, C" uniqKey="Bun C">C Bun</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mlynarczyk, A" uniqKey="Mlynarczyk A">A Mlynarczyk</name>
</author>
<author>
<name sortKey="Mlynarczyk, G" uniqKey="Mlynarczyk G">G Mlynarczyk</name>
</author>
<author>
<name sortKey="Jeljaszewicz, J" uniqKey="Jeljaszewicz J">J Jeljaszewicz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Yy" uniqKey="Liu Y">YY Liu</name>
</author>
<author>
<name sortKey="Wang, Y" uniqKey="Wang Y">Y Wang</name>
</author>
<author>
<name sortKey="Walsh, Tr" uniqKey="Walsh T">TR Walsh</name>
</author>
<author>
<name sortKey="Yi, Lx" uniqKey="Yi L">LX Yi</name>
</author>
<author>
<name sortKey="Zhang, R" uniqKey="Zhang R">R Zhang</name>
</author>
<author>
<name sortKey="Spencer, J" uniqKey="Spencer J">J Spencer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kung, Vl" uniqKey="Kung V">VL Kung</name>
</author>
<author>
<name sortKey="Ozer, Ea" uniqKey="Ozer E">EA Ozer</name>
</author>
<author>
<name sortKey="Hauser, Ar" uniqKey="Hauser A">AR Hauser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pirnay, Jp" uniqKey="Pirnay J">JP Pirnay</name>
</author>
<author>
<name sortKey="Bilocq, F" uniqKey="Bilocq F">F Bilocq</name>
</author>
<author>
<name sortKey="Pot, B" uniqKey="Pot B">B Pot</name>
</author>
<author>
<name sortKey="Cornelis, P" uniqKey="Cornelis P">P Cornelis</name>
</author>
<author>
<name sortKey="Zizi, M" uniqKey="Zizi M">M Zizi</name>
</author>
<author>
<name sortKey="Van Eldere, J" uniqKey="Van Eldere J">J Van Eldere</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Coll, F" uniqKey="Coll F">F Coll</name>
</author>
<author>
<name sortKey="Mcnerney, R" uniqKey="Mcnerney R">R McNerney</name>
</author>
<author>
<name sortKey="Preston, Md" uniqKey="Preston M">MD Preston</name>
</author>
<author>
<name sortKey="Guerra Assuncao, Ja" uniqKey="Guerra Assuncao J">JA Guerra-Assunção</name>
</author>
<author>
<name sortKey="Warry, A" uniqKey="Warry A">A Warry</name>
</author>
<author>
<name sortKey="Hill Cawthorne, G" uniqKey="Hill Cawthorne G">G Hill-Cawthorne</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ondov, Bd" uniqKey="Ondov B">BD Ondov</name>
</author>
<author>
<name sortKey="Treangen, Tj" uniqKey="Treangen T">TJ Treangen</name>
</author>
<author>
<name sortKey="Melsted, P" uniqKey="Melsted P">P Melsted</name>
</author>
<author>
<name sortKey="Mallonee, Ab" uniqKey="Mallonee A">AB Mallonee</name>
</author>
<author>
<name sortKey="Bergman, Nh" uniqKey="Bergman N">NH Bergman</name>
</author>
<author>
<name sortKey="Koren, S" uniqKey="Koren S">S Koren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jackman, Sd" uniqKey="Jackman S">SD Jackman</name>
</author>
<author>
<name sortKey="Vandervalk, Bp" uniqKey="Vandervalk B">BP Vandervalk</name>
</author>
<author>
<name sortKey="Mohamadi, H" uniqKey="Mohamadi H">H Mohamadi</name>
</author>
<author>
<name sortKey="Chu, J" uniqKey="Chu J">J Chu</name>
</author>
<author>
<name sortKey="Yeo, S" uniqKey="Yeo S">S Yeo</name>
</author>
<author>
<name sortKey="Hammond, Sa" uniqKey="Hammond S">SA Hammond</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS Genet</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS Genet</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosgen</journal-id>
<journal-title-group>
<journal-title>PLoS Genetics</journal-title>
</journal-title-group>
<issn pub-type="ppub">1553-7390</issn>
<issn pub-type="epub">1553-7404</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">30419019</article-id>
<article-id pub-id-type="pmc">6258240</article-id>
<article-id pub-id-type="publisher-id">PGENETICS-D-18-01145</article-id>
<article-id pub-id-type="doi">10.1371/journal.pgen.1007758</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Microbiology</subject>
<subj-group>
<subject>Microbial Control</subject>
<subj-group>
<subject>Antimicrobial Resistance</subject>
<subj-group>
<subject>Antibiotic Resistance</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Medicine and Health Sciences</subject>
<subj-group>
<subject>Pharmacology</subject>
<subj-group>
<subject>Antimicrobial Resistance</subject>
<subj-group>
<subject>Antibiotic Resistance</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Microbiology</subject>
<subj-group>
<subject>Medical Microbiology</subject>
<subj-group>
<subject>Microbial Pathogens</subject>
<subj-group>
<subject>Bacterial Pathogens</subject>
<subj-group>
<subject>Pseudomonas Aeruginosa</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Medicine and Health Sciences</subject>
<subj-group>
<subject>Pathology and Laboratory Medicine</subject>
<subj-group>
<subject>Pathogens</subject>
<subj-group>
<subject>Microbial Pathogens</subject>
<subj-group>
<subject>Bacterial Pathogens</subject>
<subj-group>
<subject>Pseudomonas Aeruginosa</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Organisms</subject>
<subj-group>
<subject>Bacteria</subject>
<subj-group>
<subject>Pseudomonas</subject>
<subj-group>
<subject>Pseudomonas Aeruginosa</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Organisms</subject>
<subj-group>
<subject>Bacteria</subject>
<subj-group>
<subject>Actinobacteria</subject>
<subj-group>
<subject>Mycobacterium Tuberculosis</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Computational Biology</subject>
<subj-group>
<subject>Genome Analysis</subject>
<subj-group>
<subject>Genome-Wide Association Studies</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Genetics</subject>
<subj-group>
<subject>Genomics</subject>
<subj-group>
<subject>Genome Analysis</subject>
<subj-group>
<subject>Genome-Wide Association Studies</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Genetics</subject>
<subj-group>
<subject>Human Genetics</subject>
<subj-group>
<subject>Genome-Wide Association Studies</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Microbiology</subject>
<subj-group>
<subject>Bacteriology</subject>
<subj-group>
<subject>Bacterial Genetics</subject>
<subj-group>
<subject>Bacterial Genomics</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Genetics</subject>
<subj-group>
<subject>Microbial Genetics</subject>
<subj-group>
<subject>Bacterial Genetics</subject>
<subj-group>
<subject>Bacterial Genomics</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Genetics</subject>
<subj-group>
<subject>Genomics</subject>
<subj-group>
<subject>Microbial Genomics</subject>
<subj-group>
<subject>Bacterial Genomics</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Microbiology</subject>
<subj-group>
<subject>Microbial Genomics</subject>
<subj-group>
<subject>Bacterial Genomics</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Medicine and Health Sciences</subject>
<subj-group>
<subject>Pharmacology</subject>
<subj-group>
<subject>Drugs</subject>
<subj-group>
<subject>Antimicrobials</subject>
<subj-group>
<subject>Antibiotics</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Microbiology</subject>
<subj-group>
<subject>Microbial Control</subject>
<subj-group>
<subject>Antimicrobials</subject>
<subj-group>
<subject>Antibiotics</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Computer and Information Sciences</subject>
<subj-group>
<subject>Data Visualization</subject>
<subj-group>
<subject>Infographics</subject>
<subj-group>
<subject>Graphs</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Computational Biology</subject>
<subj-group>
<subject>Genome Analysis</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Genetics</subject>
<subj-group>
<subject>Genomics</subject>
<subj-group>
<subject>Genome Analysis</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events</article-title>
<alt-title alt-title-type="running-head">Fast agnostic bacterial GWAS with De Bruijn graphs</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" equal-contrib="yes">
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0001-7010-1921</contrib-id>
<name>
<surname>Jaillard</surname>
<given-names>Magali</given-names>
</name>
<role content-type="http://credit.casrai.org/">Conceptualization</role>
<role content-type="http://credit.casrai.org/">Formal analysis</role>
<role content-type="http://credit.casrai.org/">Investigation</role>
<role content-type="http://credit.casrai.org/">Methodology</role>
<role content-type="http://credit.casrai.org/">Software</role>
<role content-type="http://credit.casrai.org/">Validation</role>
<role content-type="http://credit.casrai.org/">Visualization</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<role content-type="http://credit.casrai.org/">Writing – review & editing</role>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
<contrib contrib-type="author" equal-contrib="yes">
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0001-8976-2762</contrib-id>
<name>
<surname>Lima</surname>
<given-names>Leandro</given-names>
</name>
<role content-type="http://credit.casrai.org/">Formal analysis</role>
<role content-type="http://credit.casrai.org/">Investigation</role>
<role content-type="http://credit.casrai.org/">Methodology</role>
<role content-type="http://credit.casrai.org/">Software</role>
<role content-type="http://credit.casrai.org/">Validation</role>
<role content-type="http://credit.casrai.org/">Visualization</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<role content-type="http://credit.casrai.org/">Writing – review & editing</role>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0003-2459-9831</contrib-id>
<name>
<surname>Tournoud</surname>
<given-names>Maud</given-names>
</name>
<role content-type="http://credit.casrai.org/">Conceptualization</role>
<role content-type="http://credit.casrai.org/">Supervision</role>
<role content-type="http://credit.casrai.org/">Writing – review & editing</role>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mahé</surname>
<given-names>Pierre</given-names>
</name>
<role content-type="http://credit.casrai.org/">Data curation</role>
<role content-type="http://credit.casrai.org/">Formal analysis</role>
<role content-type="http://credit.casrai.org/">Project administration</role>
<role content-type="http://credit.casrai.org/">Writing – review & editing</role>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>van Belkum</surname>
<given-names>Alex</given-names>
</name>
<role content-type="http://credit.casrai.org/">Validation</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<role content-type="http://credit.casrai.org/">Writing – review & editing</role>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lacroix</surname>
<given-names>Vincent</given-names>
</name>
<role content-type="http://credit.casrai.org/">Conceptualization</role>
<role content-type="http://credit.casrai.org/">Writing – review & editing</role>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0002-7826-2719</contrib-id>
<name>
<surname>Jacob</surname>
<given-names>Laurent</given-names>
</name>
<role content-type="http://credit.casrai.org/">Conceptualization</role>
<role content-type="http://credit.casrai.org/">Formal analysis</role>
<role content-type="http://credit.casrai.org/">Investigation</role>
<role content-type="http://credit.casrai.org/">Methodology</role>
<role content-type="http://credit.casrai.org/">Project administration</role>
<role content-type="http://credit.casrai.org/">Software</role>
<role content-type="http://credit.casrai.org/">Supervision</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<role content-type="http://credit.casrai.org/">Writing – review & editing</role>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff001">
<label>1</label>
<addr-line>bioMérieux, Marcy l’Étoile, France</addr-line>
</aff>
<aff id="aff002">
<label>2</label>
<addr-line>Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France</addr-line>
</aff>
<aff id="aff003">
<label>3</label>
<addr-line>EPI ERABLE - Inria Grenoble, Rhône-Alpes, France</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Didelot</surname>
<given-names>Xavier</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>Imperial College London, UNITED KINGDOM</addr-line>
</aff>
<author-notes>
<fn fn-type="COI-statement" id="coi001">
<p>I have read the journal’s policy and the authors of this manuscript have the following competing interests: MJ, MT, PM and AvB are employees of bioMérieux, a company that develops and sells diagnostic tests in the field of infectious diseases. However, the study was designed and executed in an open manner and the presented method as well as all data generated have been deposited in the public domain, also resulting in the current publication.</p>
</fn>
<corresp id="cor001">* E-mail:
<email>magali.dancette@biomerieux.com</email>
</corresp>
</author-notes>
<pub-date pub-type="collection">
<month>11</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="epub">
<day>12</day>
<month>11</month>
<year>2018</year>
</pub-date>
<volume>14</volume>
<issue>11</issue>
<elocation-id>e1007758</elocation-id>
<history>
<date date-type="received">
<day>4</day>
<month>6</month>
<year>2018</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>10</month>
<year>2018</year>
</date>
</history>
<permissions>
<copyright-statement>© 2018 Jaillard et al</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Jaillard et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="pgen.1007758.pdf"></self-uri>
<abstract>
<p>Genome-wide association study (GWAS) methods applied to bacterial genomes have shown promising results for genetic marker discovery or detailed assessment of marker effect. Recently, alignment-free methods based on k-mer composition have proven their ability to explore the accessory genome. However, they lead to redundant descriptions and results which are sometimes hard to interpret. Here we introduce DBGWAS, an extended k-mer-based GWAS method producing interpretable genetic variants associated with distinct phenotypes. Relying on compacted De Bruijn graphs (cDBG), our method gathers cDBG nodes, identified by the association model, into subgraphs defined from their neighbourhood in the initial cDBG. DBGWAS is alignment-free and only requires a set of contigs and phenotypes. In particular, it does not require prior annotation or reference genomes. It produces subgraphs representing phenotype-associated genetic variants such as local polymorphisms and mobile genetic elements (MGE). It offers a graphical framework which helps interpret GWAS results. Importantly it is also computationally efficient—experiments took one hour and a half on average. We validated our method using antibiotic resistance phenotypes for three bacterial species. DBGWAS recovered known resistance determinants such as mutations in core genes in
<italic>Mycobacterium tuberculosis</italic>
, and genes acquired by horizontal transfer in
<italic>Staphylococcus aureus</italic>
and
<italic>Pseudomonas aeruginosa</italic>
—along with their MGE context. It also enabled us to formulate new hypotheses involving genetic variants not yet described in the antibiotic resistance literature. An open-source tool implementing DBGWAS is available at
<ext-link ext-link-type="uri" xlink:href="https://gitlab.com/leoisl/dbgwas">https://gitlab.com/leoisl/dbgwas</ext-link>
.</p>
</abstract>
<abstract abstract-type="summary">
<title>Author summary</title>
<p>Genome-wide association studies (GWAS) help explore the genetic bases of phenotype variation in a population. Our objective is to make GWAS amenable to bacterial genomes. These genomes can be too different to be aligned against a reference, even within a single species, making the description of their genetic variation challenging. We test the association between the phenotype and the presence in the genomes of DNA subsequences of length
<italic>k</italic>
– the so-called k-mers. These k-mers provide a versatile descriptor, allowing to capture genetic variants ranging from local polymorphisms to insertions of large mobile genetic elements. Unfortunately, they are also redundant and difficult to interpret. We rely on the compacted De Bruijn graph (cDBG), which represents the overlaps between k-mers. A single cDBG is built across all genomes, automatically removing the redundancy among consecutive k-mers, and allowing for a visualisation of the genomic context of the significant ones. We provide a computationally efficient and user-friendly implementation, enabling non-bioinformaticians to carry out GWAS on thousands of isolates in a few hours. This approach was effective in catching the dynamics of mobile genetic elements in
<italic>Staphylococcus aureus</italic>
and
<italic>Pseudomonas aeruginosa</italic>
genomes, and retrieved known local polymorphisms in
<italic>Mycobacterium tuberculosis</italic>
genomes.</p>
</abstract>
<funding-group>
<award-group id="award001">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/501100001665</institution-id>
<institution>Agence Nationale de la Recherche</institution>
</institution-wrap>
</funding-source>
<award-id>ANR-12-BS02-0008</award-id>
<principal-award-recipient>
<name>
<surname>Lacroix</surname>
<given-names>Vincent</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award002">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/501100001665</institution-id>
<institution>Agence Nationale de la Recherche</institution>
</institution-wrap>
</funding-source>
<award-id>ANR-16-CE23-0001</award-id>
<principal-award-recipient>
<name>
<surname>Lacroix</surname>
<given-names>Vincent</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award003">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/501100001665</institution-id>
<institution>Agence Nationale de la Recherche</institution>
</institution-wrap>
</funding-source>
<award-id>ANR-14-CE23-0003-01</award-id>
<principal-award-recipient>
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0002-7826-2719</contrib-id>
<name>
<surname>Jacob</surname>
<given-names>Laurent</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award004">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/501100001665</institution-id>
<institution>Agence Nationale de la Recherche</institution>
</institution-wrap>
</funding-source>
<award-id>ANR-17-CE23-0011-01</award-id>
<principal-award-recipient>
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0002-7826-2719</contrib-id>
<name>
<surname>Jacob</surname>
<given-names>Laurent</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award005">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/501100003593</institution-id>
<institution>Conselho Nacional de Desenvolvimento Científico e Tecnológico</institution>
</institution-wrap>
</funding-source>
<award-id>203362/2014-4</award-id>
<principal-award-recipient>
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0001-8976-2762</contrib-id>
<name>
<surname>Lima</surname>
<given-names>Leandro</given-names>
</name>
</principal-award-recipient>
</award-group>
<funding-statement>MJ, MT, PM and AvB are employees of bioMérieux. LL is funded by the Conselho Nacional de Desenvolvimento Cientifico e Tecnologico – CNPq, Brazil, under the Science Without Borders scholarship grant process number 203362/2014-4. VL is funded by the Agence Nationale de la Recherche ANR-12-BS02-0008 (Colib’read) and ANR-16-CE23-0001 (ASTER). LJ is funded by the Agence Nationale de la Recherche ANR-14-CE23-0003-01 (MACARON) and ANR-17-CE23-0011-01 (FAST-BIG). This work was performed using the computing facilities of the CC LBBE/PRABI. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts>
<fig-count count="6"></fig-count>
<table-count count="4"></table-count>
<page-count count="28"></page-count>
</counts>
<custom-meta-group>
<custom-meta>
<meta-name>PLOS Publication Stage</meta-name>
<meta-value>vor-update-to-uncorrected-proof</meta-value>
</custom-meta>
<custom-meta>
<meta-name>Publication Update</meta-name>
<meta-value>2018-11-26</meta-value>
</custom-meta>
<custom-meta id="data-availability">
<meta-name>Data Availability</meta-name>
<meta-value>We put online all the GWAS results generated by our method which are discussed in the manuscript (
<ext-link ext-link-type="uri" xlink:href="http://pbil.univ-lyon1.fr/datasets/DBGWAS_support/">http://pbil.univ-lyon1.fr/datasets/DBGWAS_support/</ext-link>
). The proposed method is available on gitlab:
<ext-link ext-link-type="uri" xlink:href="https://gitlab.com/leoisl/dbgwas/">https://gitlab.com/leoisl/dbgwas/</ext-link>
.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes>
<title>Data Availability</title>
<p>We put online all the GWAS results generated by our method which are discussed in the manuscript (
<ext-link ext-link-type="uri" xlink:href="http://pbil.univ-lyon1.fr/datasets/DBGWAS_support/">http://pbil.univ-lyon1.fr/datasets/DBGWAS_support/</ext-link>
). The proposed method is available on gitlab:
<ext-link ext-link-type="uri" xlink:href="https://gitlab.com/leoisl/dbgwas/">https://gitlab.com/leoisl/dbgwas/</ext-link>
.</p>
</notes>
</front>
<body>
<sec sec-type="intro" id="sec001">
<title>Introduction</title>
<p>The aim of Genome-Wide Association Studies (GWAS) is to identify associations between genetic variants and a phenotype observed in a population. They have recently emerged as an important tool in the study of bacteria, given the availability of large panels of bacterial genomes combined with phenotypic data [
<xref rid="pgen.1007758.ref001" ref-type="bibr">1</xref>
<xref rid="pgen.1007758.ref007" ref-type="bibr">7</xref>
].</p>
<p>GWAS rely on a representation of the genomic variation as numerical factors. The most common approaches are based on single nucleotide polymorphisms (SNPs), defined by aligning all genomes of the studied panel against a reference genome [
<xref rid="pgen.1007758.ref001" ref-type="bibr">1</xref>
,
<xref rid="pgen.1007758.ref003" ref-type="bibr">3</xref>
,
<xref rid="pgen.1007758.ref004" ref-type="bibr">4</xref>
] or against a pangenome built from all the genes identified by annotating the genomes [
<xref rid="pgen.1007758.ref008" ref-type="bibr">8</xref>
], and on gene presence/absence, using a pre-defined collection of genes [
<xref rid="pgen.1007758.ref005" ref-type="bibr">5</xref>
,
<xref rid="pgen.1007758.ref007" ref-type="bibr">7</xref>
]. The use of a reference genome becomes unsuitable when working on bacterial species with a large accessory genome—the part of the genome which is not present in all strains. On the other hand, methods focusing on genes are unable to cover variants in noncoding regions, including those related to transcriptional and translational regulation [
<xref rid="pgen.1007758.ref009" ref-type="bibr">9</xref>
,
<xref rid="pgen.1007758.ref010" ref-type="bibr">10</xref>
]. Moreover, some poorly studied species still lack a representative annotation [
<xref rid="pgen.1007758.ref011" ref-type="bibr">11</xref>
].</p>
<p>To circumvent these issues and make bacterial genomes amenable to GWAS, recent studies have relied on k-mers: all nucleotide substrings of length
<italic>k</italic>
found in the genomes [
<xref rid="pgen.1007758.ref002" ref-type="bibr">2</xref>
,
<xref rid="pgen.1007758.ref005" ref-type="bibr">5</xref>
,
<xref rid="pgen.1007758.ref006" ref-type="bibr">6</xref>
]. The presence of k-mers in genomes can account for diverse genetic events such as the acquisition of SNPs, (long) insertions/deletions and recombinations. Unlike SNP- or gene-based approaches, k-mer analyses do not require a reference genome or any assumption on the nature of the causal variants and can even be performed without assembling the genome sequences [
<xref rid="pgen.1007758.ref012" ref-type="bibr">12</xref>
].</p>
<p>While k-mers can reflect any genomic variation in a panel, they do not themselves represent biological entities. Translating the result of a k-mer-based GWAS into meaningful genetic variants typically requires mapping a large and redundant set of short sequences [
<xref rid="pgen.1007758.ref002" ref-type="bibr">2</xref>
,
<xref rid="pgen.1007758.ref005" ref-type="bibr">5</xref>
,
<xref rid="pgen.1007758.ref006" ref-type="bibr">6</xref>
,
<xref rid="pgen.1007758.ref013" ref-type="bibr">13</xref>
]. Recent studies have suggested reassembling the significantly associated k-mers to reduce redundancy and retrieve longer marker sequences [
<xref rid="pgen.1007758.ref006" ref-type="bibr">6</xref>
,
<xref rid="pgen.1007758.ref013" ref-type="bibr">13</xref>
]. Nonetheless, k-mer representation often loses in interpretability what it gains in flexibility, and the best way to encode the genomic variation in bacterial GWAS is not yet clearly defined [
<xref rid="pgen.1007758.ref014" ref-type="bibr">14</xref>
,
<xref rid="pgen.1007758.ref015" ref-type="bibr">15</xref>
].</p>
<p>Our approach, coined DBGWAS, for
<italic>De Bruijn Graph GWAS</italic>
, bridges the gap between, on the one hand, SNP- and gene-based representations lacking the right level of flexibility to cover complete genomic variation, and, on the other hand, k-mer-based representations which are flexible but not readily interpretable. We rely on De Bruijn graphs [
<xref rid="pgen.1007758.ref016" ref-type="bibr">16</xref>
] (DBGs), which are widely used for
<italic>de novo</italic>
genome assembly [
<xref rid="pgen.1007758.ref017" ref-type="bibr">17</xref>
,
<xref rid="pgen.1007758.ref018" ref-type="bibr">18</xref>
] and variant calling [
<xref rid="pgen.1007758.ref012" ref-type="bibr">12</xref>
,
<xref rid="pgen.1007758.ref019" ref-type="bibr">19</xref>
]. These graphs connect overlapping k-mers (here DNA fragments), yielding a compact summary of all variations across a set of genomes.
<xref ref-type="fig" rid="pgen.1007758.g001">Fig 1</xref>
illustrates the construction of such a graph for a simple example, where the only variation among the aligned genomes is a point mutation. DBGs also accommodate more complex disparities including rearrangements and insertions/deletions (
<xref ref-type="supplementary-material" rid="pgen.1007758.s001">S1 Fig</xref>
).</p>
<fig id="pgen.1007758.g001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pgen.1007758.g001</object-id>
<label>Fig 1</label>
<caption>
<title>Compacted DBG construction over a set of sequences differing by a single point mutation.</title>
<p>In this example two sequences
<italic>s</italic>
<sub>1</sub>
and
<italic>s</italic>
<sub>2</sub>
of length 12 differ by a single letter. (A) All k-mers (
<italic>k</italic>
= 4) present in these sequences are listed. A link is drawn between two k-mers when the
<italic>k</italic>
− 1 = 3 last nucleotides of the first k-mer equal the 3 first nucleotides of the second k-mer. (B) The bubble pattern represents the SNP C to A; each branch of the bubble represents an allele. (C) Linear paths of the graph are compacted; the compacted DBG of the example only contains four nodes (unitigs) and represents the same variation as the original DBG, which contained 13 nodes (k-mers).</p>
</caption>
<graphic xlink:href="pgen.1007758.g001"></graphic>
</fig>
<p>DBGWAS relies on the ability of compacted DBGs (cDBGs) to eliminate local redundancy, reflect genomic variations, and characterise the genomic environment of a k-mer at the population level. More precisely, we build a single cDBG from all the genomes included in the association study (in practice, up to thousands). The graph nodes—called unitigs—represent, by construction, sequences of variable length and are at the right level of resolution for the set of genomes considered, taking into account adaptively the genomic variation. The unitigs are individually tested for association with the phenotype, while controlling for population structure. The unitigs found to be phenotype-associated are then localised in the cDBG. Subgraphs induced by their genomic environment are extracted. They often provide a direct interpretation in terms of genetic events which results from the integration of three types of information: 1) the
<italic>topology</italic>
of the subgraph, reflecting the nature of the genetic variant, 2) the
<italic>metadata</italic>
represented by node size and colour, allowing us to identify which unitigs in the subgraph are associated to a particular phenotype status, and 3) an optional sequence
<italic>annotation</italic>
helping to detect unitig mapping to—or near—a known gene.</p>
<p>We benchmarked our novel method using several antibiotic resistance phenotypes within three bacterial species of various degrees of genome plasticity:
<italic>Mycobacterium tuberculosis</italic>
,
<italic>Staphylococcus aureus</italic>
and
<italic>Pseudomonas aeruginosa</italic>
. The subgraphs built from significant unitigs described SNPs or insertions/deletions in both core and accessory regions, and were consistent with results obtained with a resistome-based association study. In addition, novel genotype-to-phenotype associations were also suggested.</p>
</sec>
<sec sec-type="results" id="sec002">
<title>Results</title>
<p>We developed DBGWAS, available at
<ext-link ext-link-type="uri" xlink:href="https://gitlab.com/leoisl/dbgwas">https://gitlab.com/leoisl/dbgwas</ext-link>
, and validated it on panels for several bacterial species for which genome sequences and antibiotic resistance phenotypes were available. DBGWAS comprises three main steps: it first builds a variant matrix, where each variant is a pattern of presence/absence of unitigs in each genome. Each variant is then tested for association with the phenotype using a linear mixed model, adjusting for the population structure. Finally, it uses the cDBG neighbourhood of significantly associated unitigs as a proxy for their genomic environment. DBGWAS outputs a set of such subgraphs ordered by min
<sub>
<italic>q</italic>
</sub>
, which is the smallest q-value observed over unitigs in each subgraph. The top subgraphs therefore represent the genomic environment of the unitigs most significantly associated with the tested phenotype.
<xref ref-type="fig" rid="pgen.1007758.g002">Fig 2</xref>
summarises the main steps of the process. A detailed description of the pipeline is presented in the
<xref ref-type="sec" rid="sec010">Methods</xref>
section.</p>
<fig id="pgen.1007758.g002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pgen.1007758.g002</object-id>
<label>Fig 2</label>
<caption>
<title>DBGWAS pipeline.</title>
<p>DBGWAS takes as input draft assemblies and phenotype data for a panel of bacterial strains. A variant matrix
<italic>X</italic>
is built in
<italic>step</italic>
1 using cDBG nodes (called unitigs). Variants are tested in
<italic>step</italic>
2 using a linear mixed model taking into account the population structure. Significant variants are post-processed in
<italic>step</italic>
3 to provide an interactive interface assisting their interpretation.</p>
</caption>
<graphic xlink:href="pgen.1007758.g002"></graphic>
</fig>
<p>Here we rely on a few experiments to illustrate how the subgraphs output by DBGWAS can be read as genetic events. We then benchmark DBGWAS against two other k-mer-based approaches and one resistome-based approach. DBGWAS recovers known variants, while suggesting novel candidates out of the range of the resistome-based approach. We also find it to be more computationally efficient and to provide more interpretable outputs than the other k-mer-based methods.</p>
<p>A synthetic description of the discussed subgraphs is provided in
<xref rid="pgen.1007758.t001" ref-type="table">Table 1</xref>
, while a description of the top subgraphs obtained for all tested antibiotics is provided in
<xref ref-type="supplementary-material" rid="pgen.1007758.s012">S3</xref>
,
<xref ref-type="supplementary-material" rid="pgen.1007758.s013">S4</xref>
, and
<xref ref-type="supplementary-material" rid="pgen.1007758.s014">S5</xref>
Tables. The subgraphs themselves are available at
<ext-link ext-link-type="uri" xlink:href="http://pbil.univ-lyon1.fr/datasets/DBGWAS_support/experiments/#DBGWAS_all_results">http://pbil.univ-lyon1.fr/datasets/DBGWAS_support/experiments/#DBGWAS_all_results</ext-link>
.</p>
<table-wrap id="pgen.1007758.t001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pgen.1007758.t001</object-id>
<label>Table 1</label>
<caption>
<title>Resistance determinants identified by DBGWAS for
<italic>S. aureus</italic>
(SA),
<italic>M. tuberculosis</italic>
(TB) and
<italic>P. aeruginosa</italic>
(PA) panels.</title>
</caption>
<alternatives>
<graphic id="pgen.1007758.t001g" xlink:href="pgen.1007758.t001"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="center" rowspan="1" colspan="1">Panel</th>
<th align="center" rowspan="1" colspan="1">Phenotype</th>
<th align="center" rowspan="1" colspan="1">Rank</th>
<th align="center" rowspan="1" colspan="1">Sign.
<break></break>
unitigs</th>
<th align="center" rowspan="1" colspan="1">
<italic>min</italic>
<sub>
<italic>q</italic>
</sub>
</th>
<th align="center" rowspan="1" colspan="1">Est.
<break></break>
effect</th>
<th align="right" rowspan="1" colspan="1">Annotation</th>
<th align="center" rowspan="1" colspan="1">Type</th>
<th align="center" rowspan="1" colspan="1">Knowledge
<break></break>
on markers</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" rowspan="21" style="border-bottom:thick" colspan="1">SA</td>
<td align="center" rowspan="4" colspan="1">Methicillin</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">71/565</td>
<td align="center" rowspan="1" colspan="1">7.68 × 10
<sup>−188</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.949</td>
<td align="right" rowspan="1" colspan="1">
<italic>mecA</italic>
+ 7000 bp of SC
<italic>Cmec</italic>
</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">99/735</td>
<td align="center" rowspan="1" colspan="1">3.39 × 10
<sup>−72</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.865</td>
<td align="right" rowspan="1" colspan="1">6000 bp of SCC
<italic>mec</italic>
</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.96</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">3</td>
<td align="center" rowspan="1" colspan="1">11/190</td>
<td align="center" rowspan="1" colspan="1">2.14 × 10
<sup>−61</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.813</td>
<td align="right" rowspan="1" colspan="1">2000 bp of SCC
<italic>mec</italic>
</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.94</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">4</td>
<td align="center" rowspan="1" colspan="1">13/117</td>
<td align="center" rowspan="1" colspan="1">2.29 × 10
<sup>−37</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.957</td>
<td align="right" rowspan="1" colspan="1">1500 bp of SCC
<italic>mec</italic>
</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.93</td>
</tr>
<tr>
<td align="center" rowspan="2" colspan="1">Ciprofloxacin</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">7/57</td>
<td align="center" rowspan="1" colspan="1">8.67 × 10
<sup>−104</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.893</td>
<td align="right" rowspan="1" colspan="1">
<italic>parC</italic>
QRDR</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">7/31</td>
<td align="center" rowspan="1" colspan="1">2.21 × 10
<sup>−76</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.955</td>
<td align="right" rowspan="1" colspan="1">
<italic>gyrA</italic>
QRDR</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">Erythromycin</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">110/510</td>
<td align="center" rowspan="1" colspan="1">2.69 × 10
<sup>−100</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.823</td>
<td align="right" rowspan="1" colspan="1">
<italic>ermC</italic>
+ circular plasmid</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="5" colspan="1">Fusidic acid</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">7/50</td>
<td align="center" rowspan="1" colspan="1">2.75 × 10
<sup>−136</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.910</td>
<td align="right" rowspan="1" colspan="1">
<italic>fusA</italic>
</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">214/882</td>
<td align="center" rowspan="1" colspan="1">7.94 × 10
<sup>−49</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.924</td>
<td align="right" rowspan="1" colspan="1">
<italic>fusC</italic>
+ SCC
<italic>fusC</italic>
cassette</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">3</td>
<td align="center" rowspan="1" colspan="1">22/260</td>
<td align="center" rowspan="1" colspan="1">5.35 × 10
<sup>−43</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.924</td>
<td align="right" rowspan="1" colspan="1">1,500 bp of SCCfusC</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.98</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">3</td>
<td align="center" rowspan="1" colspan="1">1/72</td>
<td align="center" rowspan="1" colspan="1">5.35 × 10
<sup>−43</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.924</td>
<td align="right" rowspan="1" colspan="1">200 bp of SCC
<italic>fusC</italic>
</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.98</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">5</td>
<td align="center" rowspan="1" colspan="1">5/64</td>
<td align="center" rowspan="1" colspan="1">2.02 × 10
<sup>−22</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.888</td>
<td align="right" rowspan="1" colspan="1">
<italic>purN</italic>
</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 2 × 10
<sup>−3</sup>
</td>
</tr>
<tr>
<td align="center" rowspan="4" colspan="1">Trimethoprim</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">7/54</td>
<td align="center" rowspan="1" colspan="1">8.38 × 10
<sup>−24</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.969</td>
<td align="right" rowspan="1" colspan="1">
<italic>folA</italic>
</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">3/41</td>
<td align="center" rowspan="1" colspan="1">9.30 × 10
<sup>−18</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.966</td>
<td align="right" rowspan="1" colspan="1">btw. hyp. prot. & VOC prot.</td>
<td align="center" style="background-color:#674CE4" rowspan="1" colspan="1">LPN</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.19</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">3</td>
<td align="center" rowspan="1" colspan="1">11/70</td>
<td align="center" rowspan="1" colspan="1">9.30 × 10
<sup>−18</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.966</td>
<td align="right" rowspan="1" colspan="1">
<italic>ybaK</italic>
</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.44</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">4</td>
<td align="center" rowspan="1" colspan="1">2/30</td>
<td align="center" rowspan="1" colspan="1">6.82 × 10
<sup>−10</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.632</td>
<td align="right" rowspan="1" colspan="1">
<italic>mqo1</italic>
</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.29</td>
</tr>
<tr>
<td align="center" rowspan="5" style="border-bottom:thick" colspan="1">Gentamicin</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">173/1193</td>
<td align="center" rowspan="1" colspan="1">1.30 × 10
<sup>−205</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.873</td>
<td align="right" rowspan="1" colspan="1">
<italic>aac(6’)</italic>
gene within a plasmid</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">127/367</td>
<td align="center" rowspan="1" colspan="1">9.02 × 10
<sup>−75</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.751</td>
<td align="right" rowspan="1" colspan="1">seq. of plasmid carrying
<italic>aac(6’)</italic>
</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.38</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">3</td>
<td align="center" rowspan="1" colspan="1">2/23</td>
<td align="center" rowspan="1" colspan="1">9.01 × 10
<sup>−53</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.634</td>
<td align="right" rowspan="1" colspan="1">seq. of plasmid carrying
<italic>aac(6’)</italic>
</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.40</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">4</td>
<td align="center" rowspan="1" colspan="1">1/29</td>
<td align="center" rowspan="1" colspan="1">1.04 × 10
<sup>−40</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.579</td>
<td align="right" rowspan="1" colspan="1">seq. of plasmid carrying
<italic>aac(6’)</italic>
</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.48</td>
</tr>
<tr>
<td align="center" style="border-bottom:thick" rowspan="1" colspan="1">5</td>
<td align="center" style="border-bottom:thick" rowspan="1" colspan="1">2/56</td>
<td align="center" style="border-bottom:thick" rowspan="1" colspan="1">1.49 × 10
<sup>−33</sup>
</td>
<td align="char" char="." style="border-bottom:thick" rowspan="1" colspan="1">-0.831</td>
<td align="right" style="border-bottom:thick" rowspan="1" colspan="1">
<italic>odhB</italic>
</td>
<td align="center" style="border-bottom:thick;background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="border-bottom:thick" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 8 × 10
<sup>−5</sup>
</td>
</tr>
<tr>
<td align="center" rowspan="19" style="border-bottom:thick" colspan="1">TB</td>
<td align="center" rowspan="3" colspan="1">Rifampicin</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">36/115</td>
<td align="center" rowspan="1" colspan="1">4.84 × 10
<sup>−70</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.577</td>
<td align="right" rowspan="1" colspan="1">
<italic>rpoB</italic>
RRDR</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">6/37</td>
<td align="center" rowspan="1" colspan="1">4.35 × 10
<sup>−20</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.355</td>
<td align="right" rowspan="1" colspan="1">
<italic>katG</italic>
</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#FF9F40" rowspan="1" colspan="1">CR</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">3</td>
<td align="center" rowspan="1" colspan="1">5/41</td>
<td align="center" rowspan="1" colspan="1">4.02 × 10
<sup>−8</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.224</td>
<td align="right" rowspan="1" colspan="1">
<italic>embB</italic>
M306V</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="7" colspan="1">Streptomycin</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">5/30</td>
<td align="center" rowspan="1" colspan="1">3.70 × 10
<sup>−31</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.544</td>
<td align="right" rowspan="1" colspan="1">
<italic>rpsL</italic>
(30S ribos.protein S12)</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">6/37</td>
<td align="center" rowspan="1" colspan="1">1.06 × 10
<sup>−28</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.428</td>
<td align="right" rowspan="1" colspan="1">
<italic>katG</italic>
</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#FF9F40" rowspan="1" colspan="1">CR</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">3</td>
<td align="center" rowspan="1" colspan="1">25/113</td>
<td align="center" rowspan="1" colspan="1">2.87 × 10
<sup>−16</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.339</td>
<td align="right" rowspan="1" colspan="1">
<italic>rpoB</italic>
RRDR</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#FF9F40" rowspan="1" colspan="1">CR</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">4</td>
<td align="center" rowspan="1" colspan="1">6/45</td>
<td align="center" rowspan="1" colspan="1">1.40 × 10
<sup>−9</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.271</td>
<td align="right" rowspan="1" colspan="1">
<italic>embB</italic>
M306V</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#FF9F40" rowspan="1" colspan="1">CR</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">5</td>
<td align="center" rowspan="1" colspan="1">8/31</td>
<td align="center" rowspan="1" colspan="1">2.86 × 10
<sup>−9</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.535</td>
<td align="right" rowspan="1" colspan="1">
<italic>rrs</italic>
, 16S rRNA C517T</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">6</td>
<td align="center" rowspan="1" colspan="1">13/69</td>
<td align="center" rowspan="1" colspan="1">9.18 × 10
<sup>−5</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.216</td>
<td align="right" rowspan="1" colspan="1">
<italic>gyrA</italic>
QRDR</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#FF9F40" rowspan="1" colspan="1">CR</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">7</td>
<td align="center" rowspan="1" colspan="1">2/20</td>
<td align="center" rowspan="1" colspan="1">1.20 × 10
<sup>−3</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.739</td>
<td align="right" rowspan="1" colspan="1">
<italic>espG1</italic>
</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 3 × 10
<sup>−3</sup>
</td>
</tr>
<tr>
<td align="center" rowspan="3" colspan="1">Ofloxacin</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">31/85</td>
<td align="center" rowspan="1" colspan="1">9.66 × 10
<sup>−144</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.888</td>
<td align="right" rowspan="1" colspan="1">
<italic>gyrA</italic>
QRDR</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">9/68</td>
<td align="center" rowspan="1" colspan="1">1.59 × 10
<sup>−4</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.507</td>
<td align="right" rowspan="1" colspan="1">
<italic>ubiA</italic>
(Rv3806c)</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#FF9F40" rowspan="1" colspan="1">CR</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">3</td>
<td align="center" rowspan="1" colspan="1">3/32</td>
<td align="center" rowspan="1" colspan="1">3.86 × 10
<sup>−2</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.746</td>
<td align="right" rowspan="1" colspan="1">Rv3909</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 9 × 10
<sup>−3</sup>
</td>
</tr>
<tr>
<td align="center" rowspan="3" colspan="1">Ethionamide</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">9/39</td>
<td align="center" rowspan="1" colspan="1">7.86 × 10
<sup>−11</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.462</td>
<td align="right" rowspan="1" colspan="1">
<italic>fabG1</italic>
promoter</td>
<td align="center" style="background-color:#674CE4" rowspan="1" colspan="1">LPN</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">15/47</td>
<td align="center" rowspan="1" colspan="1">5.16 × 10
<sup>−10</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.406</td>
<td align="right" rowspan="1" colspan="1">
<italic>gyrA</italic>
QRDR</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#FF9F40" rowspan="1" colspan="1">CR</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">3</td>
<td align="center" rowspan="1" colspan="1">4/26</td>
<td align="center" rowspan="1" colspan="1">5.55 × 10
<sup>−4</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.319</td>
<td align="right" rowspan="1" colspan="1">
<italic>rrs</italic>
, 16S rRNA A1401G</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#FF9F40" rowspan="1" colspan="1">CR</td>
</tr>
<tr>
<td align="center" rowspan="3" style="border-bottom:thick" colspan="1">XDR</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">6/68</td>
<td align="center" rowspan="1" colspan="1">3.66 × 10
<sup>−39</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.905</td>
<td align="right" rowspan="1" colspan="1">
<italic>rpoB</italic>
I1187T (out. RRDR)</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#E6E6E6" rowspan="1" colspan="1">Ukn</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">3/27</td>
<td align="center" rowspan="1" colspan="1">3.66 × 10
<sup>−39</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.905</td>
<td align="right" rowspan="1" colspan="1">Rv2000</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 1</td>
</tr>
<tr>
<td align="center" style="border-bottom:thick" rowspan="1" colspan="1">3</td>
<td align="center" style="border-bottom:thick" rowspan="1" colspan="1">3/24</td>
<td align="center" style="border-bottom:thick" rowspan="1" colspan="1">9.58 × 10
<sup>−36</sup>
</td>
<td align="char" char="." style="border-bottom:thick" rowspan="1" colspan="1">0.883</td>
<td align="right" style="border-bottom:thick" rowspan="1" colspan="1">
<italic>espA</italic>
promoter</td>
<td align="center" style="border-bottom:thick;background-color:#674CE4" rowspan="1" colspan="1">LPN</td>
<td align="center" style="border-bottom:thick" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.98</td>
</tr>
<tr>
<td align="center" rowspan="6" colspan="1">PA</td>
<td align="center" rowspan="3" colspan="1">Amikacin</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">4/83</td>
<td align="center" rowspan="1" colspan="1">5.86 × 10
<sup>−9</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.621</td>
<td align="right" rowspan="1" colspan="1">SNP in
<italic>aac(6’)</italic>
</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">3/82</td>
<td align="center" rowspan="1" colspan="1">1.37 × 10
<sup>−6</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.662</td>
<td align="right" rowspan="1" colspan="1">DEAD/DEAH box helicase</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.55</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">3</td>
<td align="center" rowspan="1" colspan="1">38/315</td>
<td align="center" rowspan="1" colspan="1">2.21 × 10
<sup>−6</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.523</td>
<td align="right" rowspan="1" colspan="1">plasmid mapping on pHS87b</td>
<td align="center" style="background-color:#FFEF6E" rowspan="1" colspan="1">MGE</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.17</td>
</tr>
<tr>
<td align="center" rowspan="3" colspan="1">Levofloxacin</td>
<td align="center" rowspan="1" colspan="1">1</td>
<td align="center" rowspan="1" colspan="1">5/27</td>
<td align="center" rowspan="1" colspan="1">7.21 × 10
<sup>−29</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.884</td>
<td align="right" rowspan="1" colspan="1">
<italic>gyrA</italic>
QRDR</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">2</td>
<td align="center" rowspan="1" colspan="1">5/29</td>
<td align="center" rowspan="1" colspan="1">5.68 × 10
<sup>−6</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">-0.737</td>
<td align="right" rowspan="1" colspan="1">
<italic>parC</italic>
QRDR</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" style="background-color:#40BF00" rowspan="1" colspan="1">Pos</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">3</td>
<td align="center" rowspan="1" colspan="1">5/38</td>
<td align="center" rowspan="1" colspan="1">1.87 × 10
<sup>−2</sup>
</td>
<td align="char" char="." rowspan="1" colspan="1">0.688</td>
<td align="right" rowspan="1" colspan="1">Histidine kinase/response regulator</td>
<td align="center" style="background-color:#C1C1FF" rowspan="1" colspan="1">LPG</td>
<td align="center" rowspan="1" colspan="1">
<italic>r</italic>
<sup>2</sup>
= 0.17</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="t001fn001">
<p>For each antibiotic, we report subgraphs with their rank, number of significant unitigs over all unitigs in the subgraph (Sign. unitigs), q-value of the unitig with the lowest q-value (min
<sub>
<italic>q</italic>
</sub>
), the corresponding estimated effect (
<inline-formula id="pgen.1007758.e001">
<alternatives>
<graphic id="pgen.1007758.e001g" xlink:href="pgen.1007758.e001"></graphic>
<mml:math id="M1">
<mml:mover accent="true">
<mml:mi>β</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:math>
</alternatives>
</inline-formula>
coefficient of the linear mixed model) and annotation of the subgraph. The type of event represented by the subgraph is colour-coded as: yellow for MGE, light blue for local polymorphism in gene (LPG), and dark blue for local polymorphism in noncoding region (LPN). Known resistance markers are indicated in dark green (Pos), determinants whose presence was described to be caused by co-resistance in orange (CR), unknown variants arriving at the first rank in grey (Ukn). For other subgraphs, an
<italic>r</italic>
<sup>2</sup>
value relative to the first subgraph is provided as an estimation of linkage disequilibrium with the first subgraph. It was computed between the most significant patterns of the first and the considered subgraphs.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<sec id="sec003">
<title>Coloured bubbles highlight local polymorphism in core genes, accessory genes and noncoding regions</title>
<p>For
<italic>P. aeruginosa</italic>
levofloxacin resistance, the subgraph obtained with the lowest min
<sub>
<italic>q</italic>
</sub>
highlighted a polymorphic region in a core gene (
<xref ref-type="fig" rid="pgen.1007758.g003">Fig 3A</xref>
). Indeed, it showed a linear structure containing a complex bubble, with a fork separating susceptible (blue) and resistant (red) strains. The annotation revealed that all unitigs in this subgraph mapped to the quinolone resistance-determining region (QRDR) of the
<italic>gyrA</italic>
gene.
<italic>gyrA</italic>
codes for a subunit of the DNA gyrase targeted by quinolone antibiotics such as levofloxacin and its alteration is therefore a prevalent and efficient mechanism of resistance [
<xref rid="pgen.1007758.ref020" ref-type="bibr">20</xref>
,
<xref rid="pgen.1007758.ref021" ref-type="bibr">21</xref>
]. In all our experiments related to quinolone resistance, DBGWAS identified QRDR mutations in either
<italic>gyrA</italic>
or
<italic>parC</italic>
, which codes for another well-known quinolone target:
<italic>P. aeruginosa</italic>
levofloxacin (first subgraph,
<italic>gyrA</italic>
: min
<sub>
<italic>q</italic>
</sub>
= 7.21 × 10
<sup>−29</sup>
and second,
<italic>parC</italic>
: 5.68 × 10
<sup>−06</sup>
),
<italic>S. aureus</italic>
ciprofloxacin (first,
<italic>parC</italic>
: min
<sub>
<italic>q</italic>
</sub>
= 8.67 × 10
<sup>−104</sup>
and second,
<italic>gyrA</italic>
: 2.21 × 10
<sup>−76</sup>
), and ofloxacin resistance in
<italic>M. tuberculosis</italic>
, whose genome does not contain the
<italic>parC</italic>
gene [
<xref rid="pgen.1007758.ref022" ref-type="bibr">22</xref>
] (first,
<italic>gyrA</italic>
: min
<sub>
<italic>q</italic>
</sub>
= 9.66 × 10
<sup>−144</sup>
).</p>
<fig id="pgen.1007758.g003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pgen.1007758.g003</object-id>
<label>Fig 3</label>
<caption>
<title>Different types of genetic events identified by DBGWAS.</title>
<p>Each subgraph represents a distinct genetic event. Colours are continuously interpolated between blue for susceptible unitigs and red for resistant ones. Untested unitigs, present in > 99% or < 1% of the strains, are shown in grey. Nodes found to be not significative are shown with a transparency degree. The node size relates to its allele frequency: the larger the node, the higher the allele frequency. Circled black nodes map to annotated genes. The two tables in each panel provide information on the sugraph nodes. As an example, the subgraph in panel (A) is composed of 27 unitigs, 5 of which were significantly associated with resistance. All unitigs of this subgraph mapped to the
<italic>gyrA</italic>
gene. The subgraphs presented in the four other panels correspond to the top subgraphs (with lowest min
<sub>
<italic>q</italic>
</sub>
) obtained for different panels/phenotypes. All subgraphs are snapshots taken from DBGWAS interactive visualisation and are available online.</p>
</caption>
<graphic xlink:href="pgen.1007758.g003"></graphic>
</fig>
<p>For
<italic>P. aeruginosa</italic>
amikacin resistance, the top subgraph (min
<sub>
<italic>q</italic>
</sub>
= 5.86 × 10
<sup>−9</sup>
) highlighted a SNP in an accessory gene (
<xref ref-type="fig" rid="pgen.1007758.g003">Fig 3B</xref>
). As in
<xref ref-type="fig" rid="pgen.1007758.g003">Fig 3A</xref>
, it contained a fork separating a blue and a red node. However, other remaining nodes were not grey: they represented an accessory sequence because they were not present in all the strains. Most of these nodes were pale-red, showing that the accessory sequence was more frequent in resistant samples. The annotation revealed that this subgraph corresponded to
<italic>aac(6’)</italic>
, a gene coding for an aminoglycoside 6-acetyltransferase, an enzyme capable of inactivating aminoglycosides, such as amikacin, by acetylation [
<xref rid="pgen.1007758.ref023" ref-type="bibr">23</xref>
]. Most unitigs in this gene had a low association with resistance, except for the ones describing this particular SNP. Mapping the sequence of these unitigs on the UniProt database [
<xref rid="pgen.1007758.ref024" ref-type="bibr">24</xref>
] revealed an amino-acid change at L83S, right in the enzyme binding site. This SNP was previously shown to be responsible for substrate specificity alteration in a strain of
<italic>Pseudomonas fluorescens</italic>
[
<xref rid="pgen.1007758.ref025" ref-type="bibr">25</xref>
]. It appears to increase the amikacin acetylation ability of
<italic>aac(6’)</italic>
, making its association to amikacin resistance more significant than the gene presence itself.</p>
<p>Finally, for
<italic>M. tuberculosis</italic>
ethionamide resistance, the top subgraph (min
<sub>
<italic>q</italic>
</sub>
= 7.86 × 10
<sup>−11</sup>
,
<xref ref-type="fig" rid="pgen.1007758.g003">Fig 3C</xref>
) represented a polymorphic region in a core gene promoter. The subgraph was mostly grey and linear with a localised blue and red fork. The most reliable annotation for this subgraph was
<italic>fabG1</italic>
(also known as
<italic>mabA</italic>
), a core gene previously shown to be involved in ethionamide and isoniazid resistance [
<xref rid="pgen.1007758.ref026" ref-type="bibr">26</xref>
,
<xref rid="pgen.1007758.ref027" ref-type="bibr">27</xref>
]. None of the significantly associated unitigs mapped to the
<italic>fabG1</italic>
gene, but their close neighbours did (highlighted in
<xref ref-type="fig" rid="pgen.1007758.g003">Fig 3C</xref>
by black circles), suggesting that the detected variant was located in the promoter region of the gene. This was confirmed by mapping the significant unitig sequences using the Tuberculosis Mutation database of the
<italic>mubii</italic>
resource [
<xref rid="pgen.1007758.ref028" ref-type="bibr">28</xref>
].</p>
</sec>
<sec id="sec004">
<title>Long single-coloured paths denote mobile genetic element insertions</title>
<p>For
<italic>S. aureus</italic>
resistance to methicillin, the top subgraph (min
<sub>
<italic>q</italic>
</sub>
= 7.68 × 10
<sup>−188</sup>
), shown in
<xref ref-type="fig" rid="pgen.1007758.g003">Fig 3D</xref>
, revealed a gene cassette insertion. It contained a long path of red nodes, and a branching region including another red node path. The first path mapped to the
<italic>mecA</italic>
gene, extensively described in this context and known to be carried by the Staphylococcal Cassette Chromosome
<italic>mec</italic>
(SCC
<italic>mec</italic>
) [
<xref rid="pgen.1007758.ref021" ref-type="bibr">21</xref>
,
<xref rid="pgen.1007758.ref029" ref-type="bibr">29</xref>
,
<xref rid="pgen.1007758.ref030" ref-type="bibr">30</xref>
]. The other part of the subgraph represented a >5,000 bp fragment of the cassette. It was less linear because it summarised several types of the cassette differing by their structure and gene content [
<xref rid="pgen.1007758.ref029" ref-type="bibr">29</xref>
]. The next subgraphs represented other regions of the same cassette. Interestingly, retaining a greater number of unitigs to build the subgraphs leads to merging these individual subgraphs, representing related genomic regions, into a single one. This can be done by increasing the Significant Features Filter (
<italic>SFF</italic>
) parameter value, which defines the unitigs used to build the subgraphs. By default, the unitigs corresponding to the 100 lowest q-values are retained (
<italic>SFF</italic>
= 100). Increasing the
<italic>SFF</italic>
value to 150 (150th q-value = 1.60 × 10
<sup>−27</sup>
) allowed us to reconstruct the entire SCC
<italic>mec</italic>
cassette, as shown in
<xref ref-type="supplementary-material" rid="pgen.1007758.s003">S3 Fig</xref>
.</p>
<p>For
<italic>S. aureus</italic>
erythromycin resistance, a unique subgraph was generated (min
<sub>
<italic>q</italic>
</sub>
= 2.69 × 10
<sup>−100</sup>
). As shown in
<xref ref-type="fig" rid="pgen.1007758.g003">Fig 3E</xref>
, the subgraph described the circular structure of a 2,500 bp-long plasmid known to carry the causal
<italic>ermC</italic>
gene together with a replication and maintenance protein in strong linkage disequilibrium with
<italic>ermC</italic>
[
<xref rid="pgen.1007758.ref030" ref-type="bibr">30</xref>
,
<xref rid="pgen.1007758.ref031" ref-type="bibr">31</xref>
].</p>
<p>For
<italic>P. aeruginosa</italic>
amikacin resistance, the third subgraph (min
<sub>
<italic>q</italic>
</sub>
= 2.21 × 10
<sup>−6</sup>
) represented a 10,000 bp plasmid acquisition. Using the NCBI nucleotide database [
<xref rid="pgen.1007758.ref032" ref-type="bibr">32</xref>
], most of the unitigs in this subgraph mapped to the predicted prophage regions of an integrative and conjugative plasmid, whose structure corresponds to a plasmid, pHS87b, recently described in the amikacin resistant
<italic>P. aeruginosa</italic>
HS87 strain [
<xref rid="pgen.1007758.ref033" ref-type="bibr">33</xref>
].
<xref ref-type="supplementary-material" rid="pgen.1007758.s004">S4</xref>
and
<xref ref-type="supplementary-material" rid="pgen.1007758.s005">S5</xref>
Figs provide more examples of MGEs recovered by DBGWAS, and the Interpretation of significant unitigs (step 3) subsection of the
<xref ref-type="sec" rid="sec010">Methods</xref>
section discusses
<italic>SFF</italic>
default value and tuning.</p>
</sec>
<sec id="sec005">
<title>DBGWAS reports expected variants without prior knowledge</title>
<p>Although resistance determinants are not perfectly or exhaustively known for all species, some resistance mechanisms are well described. This is the case of
<italic>gyrA</italic>
and
<italic>parC</italic>
alteration in fluoroquinolone resistance in
<italic>P. aeruginosa</italic>
[
<xref rid="pgen.1007758.ref020" ref-type="bibr">20</xref>
], and of the alteration of two streptomycin targets: the ribosomal protein S12 (coded by
<italic>rpsL</italic>
) and the 16S rRNA (coded by
<italic>rrs</italic>
) in
<italic>M. tuberculosis</italic>
[
<xref rid="pgen.1007758.ref034" ref-type="bibr">34</xref>
]. Here we verify the ability of bacterial GWAS methods to recover these known mechanisms. We compared DBGWAS results to those obtained by applying the same association model to a collection of known resistance genes and SNPs [
<xref rid="pgen.1007758.ref007" ref-type="bibr">7</xref>
,
<xref rid="pgen.1007758.ref035" ref-type="bibr">35</xref>
] (see the Resistome-based association studies subsection of the
<xref ref-type="sec" rid="sec010">Methods</xref>
section), and to two other recent k-mer-based methods: pyseer [
<xref rid="pgen.1007758.ref006" ref-type="bibr">6</xref>
,
<xref rid="pgen.1007758.ref036" ref-type="bibr">36</xref>
], and HAWK [
<xref rid="pgen.1007758.ref013" ref-type="bibr">13</xref>
].</p>
<p>For
<italic>P. aeruginosa</italic>
levofloxacin resistance (
<xref rid="pgen.1007758.t002" ref-type="table">Table 2</xref>
), both DBGWAS and pyseer identified the two expected known causal determinants reported by the prior resistome-based study:
<italic>gyrA</italic>
and
<italic>parC</italic>
, while HAWK only reported
<italic>gyrA</italic>
. pyseer reported 224 k-mers, all mapping to
<italic>gyrA</italic>
and
<italic>parC</italic>
, while the other methods reported less than 10 features (subgraphs or reassembled k-mers), among which were several unknown, potentially new candidate markers.</p>
<table-wrap id="pgen.1007758.t002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pgen.1007758.t002</object-id>
<label>Table 2</label>
<caption>
<title>Resistance determinants found by the four methods for
<italic>P. aeruginosa</italic>
levofloxacin resistance.</title>
</caption>
<alternatives>
<graphic id="pgen.1007758.t002g" xlink:href="pgen.1007758.t002"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<tbody>
<tr>
<td align="left" style="border-right:thick" rowspan="1" colspan="1">
<bold>Legend</bold>
</td>
<td align="left" rowspan="1" colspan="1">resistome-based</td>
<td align="left" rowspan="1" colspan="1">DBGWAS</td>
<td align="left" rowspan="1" colspan="1">pyseer</td>
<td align="left" rowspan="1" colspan="1">HAWK</td>
</tr>
<tr>
<td align="left" style="border-right:thick" rowspan="1" colspan="1">Time (mem)</td>
<td align="left" rowspan="1" colspan="1">37m (7.2 GB)</td>
<td align="left" rowspan="1" colspan="1">21m (3.2 GB)</td>
<td align="left" rowspan="1" colspan="1">24h22m (14.5 GB)</td>
<td align="left" rowspan="1" colspan="1">39m (4.2 GB)</td>
</tr>
<tr>
<td align="left" style="border-right:thick" rowspan="1" colspan="1">Nb reported</td>
<td align="left" rowspan="1" colspan="1">2 variants</td>
<td align="left" rowspan="1" colspan="1">5 subgraphs</td>
<td align="left" rowspan="1" colspan="1">224 k-mers</td>
<td align="left" rowspan="1" colspan="1">8 reassembled k-mers</td>
</tr>
<tr>
<td align="left" rowspan="2" style="border-right:thick;background-color:#40BF00" colspan="1">Known
<break></break>
positive</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>
<underline>gyrA</underline>
</italic>
(2.11 × 10
<sup>−22</sup>
)</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>
<underline>gyrA</underline>
</italic>
(7.21 × 10
<sup>−29</sup>
)</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>
<underline>gyrA</underline>
</italic>
(1.97 × 10
<sup>−17</sup>
)</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>gyrA</italic>
(2.82 × 10
<sup>−14</sup>
)</td>
</tr>
<tr>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>parC</italic>
(1.83 × 10
<sup>−5</sup>
)</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>parC</italic>
(5.68 × 10
<sup>−6</sup>
)</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>parC</italic>
(5.68 × 10
<sup>−9</sup>
)</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="3" style="border-right:thick;background-color:#E6E6E6" colspan="1">Unknown</td>
<td align="left" rowspan="3" colspan="1"></td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">HK/RR (1.87 × 10
<sup>−2</sup>
)</td>
<td align="left" rowspan="3" colspan="1"></td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">
<underline>tnp</underline>
(1.66 × 10
<sup>−14</sup>
)</td>
</tr>
<tr>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">tnp</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">NC near tnp</td>
</tr>
<tr>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">
<italic>topA</italic>
</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="t002fn001">
<p>This table presents the annotation of the features identified by the tested methods with default parameters. The total number of reported features, as well as the execution time and memory load (in Gigabytes) are given in the header. For k-mer-based methods, annotations were retrieved by mapping unitig/k-mer sequences to the resistance and Uniprot databases (see Interpretation of significant unitigs (step 3) subsection of the
<xref ref-type="sec" rid="sec010">Methods</xref>
section), and completed when needed by Blast on NCBI Nucleotide database. Green cells correspond to resistance determinants already described in the literature. Grey cells represent unknown determinants. Within each category, annotations are ordered by increasing minimum p/q-values. p/q-values are reported only for the most significant annotations. For each method, the annotation with the lowest p/q-values is underlined. ‘NC’ means noncoding region and ‘tnp’ transposase.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>For
<italic>M. tuberculosis</italic>
streptomycin resistance (
<xref rid="pgen.1007758.t003" ref-type="table">Table 3</xref>
), the four methods reported the two expected known causal determinants
<italic>rpsL</italic>
and
<italic>rrs</italic>
. However, while the resistome-based study and DBGWAS methods ranked the causal
<italic>rpsL</italic>
determinant first, pyseer and HAWK reported their lowest p/q-values for the false positive
<italic>katG</italic>
determinant.
<italic>katG</italic>
and other false positives caused by co-resistance were among the top-ranked features for all methods and this is a well described phenomenon in
<italic>M. tuberculosis</italic>
species [
<xref rid="pgen.1007758.ref034" ref-type="bibr">34</xref>
,
<xref rid="pgen.1007758.ref037" ref-type="bibr">37</xref>
].</p>
<table-wrap id="pgen.1007758.t003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pgen.1007758.t003</object-id>
<label>Table 3</label>
<caption>
<title>Resistance determinants found by the four methods for
<italic>M. tuberculosis</italic>
streptomycin resistance.</title>
</caption>
<alternatives>
<graphic id="pgen.1007758.t003g" xlink:href="pgen.1007758.t003"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<tbody>
<tr>
<td align="left" style="border-right:thick" rowspan="1" colspan="1">
<bold>Legend</bold>
</td>
<td align="left" rowspan="1" colspan="1">resistome-based</td>
<td align="left" rowspan="1" colspan="1">DBGWAS</td>
<td align="left" rowspan="1" colspan="1">pyseer</td>
<td align="left" rowspan="1" colspan="1">HAWK</td>
</tr>
<tr>
<td align="left" style="border-right:thick" rowspan="1" colspan="1">Time (mem)</td>
<td align="left" rowspan="1" colspan="1">1h31m (2.1 GB)</td>
<td align="left" rowspan="1" colspan="1">42m (4.3 GB)</td>
<td align="left" rowspan="1" colspan="1">14h14m (102.4 GB)</td>
<td align="left" rowspan="1" colspan="1">3h01m (3.7 GB)</td>
</tr>
<tr>
<td align="left" style="border-right:thick" rowspan="1" colspan="1">Nb reported</td>
<td align="left" rowspan="1" colspan="1">28 variants</td>
<td align="left" rowspan="1" colspan="1">24 subgraphs</td>
<td align="left" rowspan="1" colspan="1">85,011 k-mers</td>
<td align="left" rowspan="1" colspan="1">2,038 reassembled k-mers</td>
</tr>
<tr>
<td align="left" rowspan="2" style="background-color:#40BF00;border-right:thick" colspan="1">Known
<break></break>
positive</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>
<underline>rpsL</underline>
</italic>
(1.96 × 10
<sup>−33</sup>
)</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>
<underline>rpsL</underline>
</italic>
(3.70 × 10
<sup>−31</sup>
)</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>rpsL</italic>
(4.85 × 10
<sup>−55</sup>
)</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>rpsL</italic>
(5.72 × 10
<sup>−47</sup>
)</td>
</tr>
<tr>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>rrs</italic>
(5.40 × 10
<sup>−8</sup>
)</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>rrs</italic>
(2.86 × 10
<sup>−9</sup>
)</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>rrs</italic>
(1.63 × 10
<sup>−14</sup>
)</td>
<td align="left" style="background-color:#40BF00" rowspan="1" colspan="1">
<italic>rrs</italic>
(3.45 × 10
<sup>−20</sup>
)</td>
</tr>
<tr>
<td align="left" rowspan="13" style="background-color:#FF9F40;border-right:thick" colspan="1">Determinant described for other antibiotics</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>katG</italic>
(2.61 × 10
<sup>−30</sup>
)</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>katG</italic>
(1.06 × 10
<sup>−28</sup>
)</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<underline>katG</underline>
</italic>
(2.12 × 10
<sup>−71</sup>
)</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<underline>katG</underline>
</italic>
(1.44 × 10
<sup>−57</sup>
)</td>
</tr>
<tr>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>rpoB</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>rpoB</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>rpoB</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>embB</italic>
</td>
</tr>
<tr>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>gidB</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>embB</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>embB</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>kasA</bold>
</italic>
</td>
</tr>
<tr>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>gyrA</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>gyrA</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>ubiA</bold>
</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>embC</bold>
</italic>
</td>
</tr>
<tr>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>embB</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>gidB</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>pncA</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>gyrA</italic>
</td>
</tr>
<tr>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>fabG1</italic>
promoter</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>rpoC</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>fabG1</italic>
promoter</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>iniA</bold>
</italic>
</td>
</tr>
<tr>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>pncA</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>fabG1</italic>
promoter</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>gyrA</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>embA</bold>
</italic>
</td>
</tr>
<tr>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>rpoC</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>ubiA</bold>
</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>gidB</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>embR</bold>
</italic>
</td>
</tr>
<tr>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>inhA</italic>
</td>
<td align="left" rowspan="5" colspan="1"></td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>ethA</bold>
</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>gidB</italic>
</td>
</tr>
<tr>
<td align="left" rowspan="4" colspan="1"></td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>embA</bold>
</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>tsnR</bold>
</italic>
</td>
</tr>
<tr>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>embC</bold>
</italic>
</td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>rpoB</italic>
</td>
</tr>
<tr>
<td align="left" rowspan="2" colspan="1"></td>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>pncA</italic>
</td>
</tr>
<tr>
<td align="left" style="background-color:#FF9F40" rowspan="1" colspan="1">
<italic>
<bold>ethA</bold>
</italic>
</td>
</tr>
<tr>
<td align="left" rowspan="9" style="background-color:#E6E6E6;border-right:thick" colspan="1">Unknown
<break></break>
(top list)</td>
<td align="left" rowspan="9" colspan="1"></td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">
<italic>espG1</italic>
(1.20 × 10
<sup>−3</sup>
)</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">NC near tnp/PE (1.13 × 10
<sup>−19</sup>
)</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">NC near tnp/PPE (2.93 × 10
<sup>−57</sup>
)</td>
</tr>
<tr>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">
<italic>rpsN</italic>
</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">Rv0270</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">tnp</td>
</tr>
<tr>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">NC near tnp/PPE</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">Rv2665</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">Rv2825c/Rv2828c</td>
</tr>
<tr>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">
<italic>rnj</italic>
</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">Rv2743c</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">13E12 repeat family protein</td>
</tr>
<tr>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">Rv2672</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">Rv2522c</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">PPE</td>
</tr>
<tr>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">
<italic>espA</italic>
promoter</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">NC near tnp/PPE</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">CRISPR repeats, down
<italic>Cas</italic>
genes</td>
</tr>
<tr>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">Rv2456c promoter</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">
<italic>guaA</italic>
</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">
<italic>mmpL14</italic>
</td>
</tr>
<tr>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">
<italic>whiB6</italic>
</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">
<italic>kdpD</italic>
</td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1">
<italic>esxM</italic>
</td>
</tr>
<tr>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1"></td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1"></td>
<td align="left" style="background-color:#E6E6E6" rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="t003fn001">
<p>This table presents the annotation of the features identified by the tested methods with default parameters. The total number of reported features, as well as the execution time and memory load (in Gigabytes) are given in the header. For k-mer-based methods, annotations were retrieved by mapping unitig/k-mer sequences to the resistance and Uniprot databases (see Interpretation of significant unitigs (step 3) subsection of the
<xref ref-type="sec" rid="sec010">Methods</xref>
section), and completed when needed by Blast on NCBI Nucleotide database. Green cells correspond to resistance determinants already described in the literature, orange cells to resistance determinants described for association with other antibiotics. The annotations not found by the resistome-based strategy are written in bold. Grey cells represent unknown determinants. Within each category, annotations are ordered by increasing minimum p/q-values. p/q-values are reported only for the most significant annotations. For each method, the annotation with the lowest p/q-values is underlined. ‘NC’ means noncoding region, ‘tnp’ transposase, ‘PE’ stands for PE-family protein and ‘PPE’ for PPE-family protein.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>Additional results for all antibiotics can be found in
<xref ref-type="supplementary-material" rid="pgen.1007758.s015">S6</xref>
and
<xref ref-type="supplementary-material" rid="pgen.1007758.s016">S7</xref>
Tables for resistome-based association studies, and in
<xref ref-type="supplementary-material" rid="pgen.1007758.s012">S3</xref>
and
<xref ref-type="supplementary-material" rid="pgen.1007758.s014">S5</xref>
Tables for DBGWAS.</p>
</sec>
<sec id="sec006">
<title>DBGWAS provides novel hypotheses</title>
<p>In addition to resistance markers, all three k-mer-based approaches reported several unknown variants, not described in the context of resistance. Among them, in the context of streptomycin resistance, a noncoding region between a transposase and a PPE-family protein was reported by the three methods but, as expected, not by the resistome-based approach, as only resistance genes were included in this analysis. More generally, knowledge-based approaches such as SNP-, gene- or resistome-based GWAS can be limited in the context of new marker discovery, since any causal variant absent from the chosen reference would remain untested. Besides being time-consuming, preparing such a list of genetic variants can be problematic for bacterial species without extensive annotation or reference availability. Here we describe associations identified by DBGWAS and which were never described in the antibiotic resistance literature.</p>
<p>In our
<italic>P. aeruginosa</italic>
panel, the second subgraph obtained for amikacin resistance (min
<sub>
<italic>q</italic>
</sub>
= 1.37 × 10
<sup>−6</sup>
) gathered unitigs mapping to the 3’ region of a DEAD/DEAH box helicase, known to be involved in stress tolerance in
<italic>P. aeruginosa</italic>
[
<xref rid="pgen.1007758.ref038" ref-type="bibr">38</xref>
]. The unitig with the lowest q-value was present in 13 of 47 resistant strains and in only 1 of 233 susceptible strains and represented a C-C haplotype summarising two mutated positions: 2097 and 2103. This annotation was not an artefact of the population structure, properly taken into account by the linear mixed model. Indeed the 13 resistant strains corresponded to distinct clones belonging to two phylogroups, one of them containing the susceptible strain. In
<italic>P. aeruginosa</italic>
levofloxacin resistance, the third subgraph (min
<sub>
<italic>q</italic>
</sub>
= 1.87 × 10
<sup>−2</sup>
) represented a L650M amino-acid change in a hybrid sensor histidine kinase/response regulator. Such two-components regulatory systems play important roles in the adaptation of organisms to their environment, for instance in the regulation of biofilm formation in
<italic>P. aeruginosa</italic>
[
<xref rid="pgen.1007758.ref039" ref-type="bibr">39</xref>
], and as such may play a role in antibiotic resistance.</p>
<p>In
<italic>S. aureus</italic>
, polymorphisms within genes not known to be related to resistance were identified for several antibiotics:
<italic>purN</italic>
(min
<sub>
<italic>q</italic>
</sub>
= 2.02 × 10
<sup>−22</sup>
) for fusidic acid,
<italic>odhB</italic>
(min
<sub>
<italic>q</italic>
</sub>
= 1.49 × 10
<sup>−33</sup>
) for gentamicin,
<italic>ybaK</italic>
and
<italic>mqo1</italic>
(min
<sub>
<italic>q</italic>
</sub>
= 9.30 × 10
<sup>−18</sup>
, resp. 6.82 × 10
<sup>−10</sup>
) for trimethoprim. None of these genes have been associated with antibiotic resistance before, to the best of our knowledge.</p>
<p>In
<italic>M. tuberculosis</italic>
, polymorphisms in two genes encoding proteins involved in
<italic>cell wall and cell processes</italic>
,
<italic>espG1</italic>
and
<italic>espA</italic>
, were found associated with streptomycin (seventh subgraph, min
<sub>
<italic>q</italic>
</sub>
= 9.43 × 10
<sup>−4</sup>
) and XDR phenotype (third subgraph, min
<sub>
<italic>q</italic>
</sub>
= 9.58 × 10
<sup>−36</sup>
), respectively. Again, these genes have never been reported in association with antibiotic resistance before.</p>
<p>Although experimental validation would be required to tell whether these hypotheses are false positive (e.g., in linkage with causal variants) or actual resistance mechanisms not yet documented, DBGWAS is a valuable tool to screen for novel candidate markers. Moreover it provides a first level of variant description (SNPs in gene or promoter, MGE, etc) which can directly drive the biological validation.</p>
</sec>
<sec id="sec007">
<title>DBGWAS facilitates the interpretation of k-mer-based GWAS</title>
<p>Other k-mer-based approaches are as agnostic as DBGWAS and were also able to provide novel hypotheses, but interpreting their output can prove more challenging than a SNP/gene-based GWAS. In the
<italic>M. tuberculosis</italic>
streptomycin resistance experiment for example, they reported several thousands of features, while DBGWAS reported only 24 annotated subgraphs without missing any expected determinant (see
<xref rid="pgen.1007758.t003" ref-type="table">Table 3</xref>
). The thousands of k-mers generated by HAWK and pyseer are of course also amenable to interpretation: to build our
<xref rid="pgen.1007758.t003" ref-type="table">Table 3</xref>
, we mapped these k-mers to references and extracted annotated variants which showed at least one hit. However, doing so required additional efforts and a working knowledge of the most appropriate annotated references. In addition, k-mers which do not map to the chosen reference cannot be interpreted. By contrast, DBGWAS always returns a subgraph containing these k-mers. Even when no annotation exists, the topology and colours of the subgraphs may hint towards the nature of the causal variant.</p>
<p>In addition to providing context for significant k-mers and guiding their interpretation as SNPs or MGEs, DBGWAS clustering of close variants into a subgraph can describe hypervariable regions as single entities, and highlight highly associated haplotypes. As an example, the top subgraph for rifampicin resistance (min
<sub>
<italic>q</italic>
</sub>
= 4.84 × 10
<sup>−70</sup>
) contained 36 significant unitigs, distinguishing between susceptible (blue) and resistant (red) strains. Instead of a single point mutation, this subgraph represented a polymorphic region known as the rifampicin resistance-determining region (RRDR) of the
<italic>rpoB</italic>
gene. The unitig with the lowest q-value covered several mutant positions, defining a particular haplotype strongly associated with rifampicin susceptibility. Where DBGWAS reported in this case only one subgraph, pyseer, for instance, reported 470 k-mers with the
<italic>rpoB</italic>
annotation, and the resistome-based association study reported in this case 4 distinct SNPs in
<italic>rpoB</italic>
(
<xref ref-type="supplementary-material" rid="pgen.1007758.s015">S6 Table</xref>
). In another user-submitted example, DBGWAS identified mosaic alleles of three
<italic>pbp</italic>
genes involved in beta-lactam resistance of
<italic>Streptococcus pneumoniae</italic>
. Like in the RRDR example, it returned five subgraphs corresponding to the three genes—three subgraphs were annotated
<italic>pbp2x</italic>
and represented three distinct polymorphic regions of the gene. Each subgraph summarised the polymorphism of the gene, as opposed to one separate feature for each SNP.</p>
<p>Admittedly, some subgraphs output by DBGWAS are not readily interpretable: they are neither coloured bubbles highlighting SNPs, nor long single-coloured paths denoting MGE insertions. This was the case of several subgraphs produced for
<italic>P. aeruginosa</italic>
amikacin resistance, and presented in
<xref ref-type="supplementary-material" rid="pgen.1007758.s006">S6 Fig</xref>
. Genetic variants inserted in variable regions, for example, lead to subgraphs with a high average degree, or to very large subgraphs. The fourth subgraph for instance (min
<sub>
<italic>q</italic>
</sub>
= 2.21 × 10
<sup>−6</sup>
) contains a path of three red (positively-associated) nodes lying in a noncoding region between variable accessory genes. Consequently, their neighbour unitigs branch to various other unitigs, making the structure complex and hard to interpret. Complex subgraphs also arise when several associated variants have overlapping neighbourhoods (as defined in the Graph neighbourhoods subsection in the
<xref ref-type="sec" rid="sec010">Methods</xref>
section, and tuned with the
<italic>nh</italic>
parameter) in at least one strain. This is the case for the subgraph with the smallest min
<sub>
<italic>q</italic>
</sub>
which aggregates
<italic>aac</italic>
(6′) acetyltransferase and the CML efflux pump.</p>
<p>The interpretation of such subgraphs is not straightforward. We often found it helpful to tune the
<italic>nh</italic>
and
<italic>SFF</italic>
parameters to break large subgraphs into a set of smaller ones, as discussed in the discussed in the
<xref ref-type="sec" rid="sec010">Methods</xref>
section. For the
<italic>aac</italic>
(6′) subgraph, where nearby variants are aggregated into a large subgraph, reducing the
<italic>SFF</italic>
value to 15 provided a much smaller and easier-to-interpret subgraph focusing on the
<italic>aac</italic>
(6′) mutation (
<xref ref-type="fig" rid="pgen.1007758.g003">Fig 3B</xref>
). Otherwise, we recommend to focus on the topology of the most significant unitigs and their close neighbours.</p>
</sec>
<sec id="sec008">
<title>DBGWAS is fast, memory-efficient, and scales to very large panels</title>
<p>To assess the scalability of DBGWAS to large datasets, we retrieved 5,000 genomes from
<italic>M. tuberculosis</italic>
, 9,000 genomes from
<italic>S. aureus</italic>
and 2,500 genomes from
<italic>P. aeruginosa</italic>
, as described in the Large panels subsection of the
<xref ref-type="sec" rid="sec010">Methods</xref>
section. We present in
<xref ref-type="supplementary-material" rid="pgen.1007758.s009">S9 Fig</xref>
the runtime and memory usage performances for these panels. All 180 runs took less than 5 days and 250 GB of RAM on 8 cores. Both the computational time and memory usage increase log-linearly with the panel size. Moreover, at equal panel size, DBGWAS performance also depends on the genome complexity, requiring less computational resource for more clonal genomes such as
<italic>M. tuberculosis</italic>
.</p>
<p>We also compared the computational performance of DBGWAS with pyseer and HAWK. The benchmark was performed on 13 datasets, including one large dataset of 2,500 genomes for each of the 3 species (see the Datasets subsection in the
<xref ref-type="sec" rid="sec010">Methods</xref>
section for details). Detailed results are presented in
<xref ref-type="supplementary-material" rid="pgen.1007758.s011">S2 Table</xref>
. DBGWAS was the fastest tool in 11 out of 13 experiments, always taking less than 2 hours. HAWK ran in less than 10 hours in 12 out of 13 experiments, and was a little faster than DBGWAS on two of the large-scale datasets. pyseer took from 13 to 53 hours on 9 experiments, and failed on the 4 others: one exceeded the disk space limit of 1TB, three exceeded the runtime limit of five days. It was brought to our attention during the reviewing process that piping the output of fsm-lite through gzip would decrease the disk space usage. HAWK was more parsimonious in memory usage than DBGWAS on the large scale panels. This can be explained by the fact that the 0.8.3-beta version of HAWK which we are using does not take into account the population structure, and as such does not have to compute an
<italic>n</italic>
×
<italic>n</italic>
covariance matrix, providing it a large gain in memory usage—and, to a lesser extent, runtime—for large panels. On the other hand, disregarding the population structure could also lead to spurious discoveries. HAWK v0.9.8-beta offers an adjustment but failed to recover the known true positives, which is why we chose to present the results of the 0.8.3-beta version. DBGWAS and HAWK typically used one order of magnitude less memory than pyseer. The most memory-consuming step for pyseer was the k-mer counting step relying on fsm-lite.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="sec009">
<title>Discussion</title>
<p>In this article we introduce an efficient method for bacterial GWAS. Our method is agnostic: it considers all regions of the genomes and is able to identify potentially new causal variants as different as SNPs in noncoding regions and MGE insertions/deletions. It performs as well as the current SNP- and gene-based gold standard approaches for retrieving known determinants, from genome pre-assemblies and without relying on annotations or reference genomes.</p>
<p>DBGWAS exploits the genetic environment of the significant k-mers through their neighbourhood in the cDBG, providing a valuable interpretation framework. Because it uses only contig sequences as input, it allows GWAS on bacterial species for which the genomes are still poorly annotated or lack a suitable reference genome. DBGWAS makes bacterial GWAS possible in two hours using a single-core computer (see
<xref ref-type="supplementary-material" rid="pgen.1007758.s010">S1 Table</xref>
), outperforming other state-of-the-art k-mer-based approaches.</p>
<p>Underlying our method, graph-based genome sequence representations such as DBGs, extend the notion of the reference genome to cases where a single sequence stops being an appropriate approximation [
<xref rid="pgen.1007758.ref040" ref-type="bibr">40</xref>
,
<xref rid="pgen.1007758.ref041" ref-type="bibr">41</xref>
]. As demonstrated in this paper, they pave the way to GWAS on highly plastic bacterial genomes and could also be useful for microbiomes [
<xref rid="pgen.1007758.ref042" ref-type="bibr">42</xref>
] or human tumours [
<xref rid="pgen.1007758.ref013" ref-type="bibr">13</xref>
].</p>
<p>DBGWAS currently relies on the Benjamini-Hochberg procedure to control the FDR and offers no advance exploiting the dependence among presence/absence patterns. An important improvement would be to control the false discovery rate at the subgraph level instead of the unitig level. DBGWAS could be extended to different statistical tasks by adapting its underlying association model, to allow for continuous phenotypes or identify epistatic effects, for instance. The interpretability of the extracted subgraphs could also be improved by training a machine learning model to predict which types of event they represent [
<xref rid="pgen.1007758.ref043" ref-type="bibr">43</xref>
]. This automated labelling could guide users in their interpretation and allow them to search for specific events, such as SNPs in core genes or rearrangements.</p>
<p>Several recent studies describe
<italic>in silico</italic>
models for defining a genomic antibiogram and hopes are high that such technologies will complement the classic phenotypic methods [
<xref rid="pgen.1007758.ref044" ref-type="bibr">44</xref>
]. Several studies have already demonstrated that in some cases, genomic antibiograms can be at least as good as phenotypic ones [
<xref rid="pgen.1007758.ref030" ref-type="bibr">30</xref>
,
<xref rid="pgen.1007758.ref045" ref-type="bibr">45</xref>
<xref rid="pgen.1007758.ref047" ref-type="bibr">47</xref>
]. Contrary to our approach, these studies require extensive resistance marker databases. DBGWAS will surely contribute to the extension of such databases or to the development of agnostic genomic antibiograms.</p>
<p>In conclusion, we demonstrate for three medically important bacterial species that resistance markers can be detected rapidly with relative ease, using simple computer equipment. Our integrated software and visualisation tools offer an intuitive variant representation, hence will provide future users with an enhanced insight into genotype to phenotype correlations, in all domains of microbiology, beyond that of antibiotic resistance. This will include complex traits such as biofilm formation, epidemicity and virulence.</p>
</sec>
<sec sec-type="materials|methods" id="sec010">
<title>Methods</title>
<sec id="sec011">
<title>Encoding genomic variation with compacted DBGs</title>
<p>DBGs are directed graphs that efficiently represent all the information contained in a set of sequences. Nodes represent all the unique k-mers (genome sequence substrings of length
<italic>k</italic>
) extracted from the input sequences. Edges represent (
<italic>k</italic>
− 1)-exact-overlaps between k-mers: an edge connects a node
<italic>n</italic>
<sub>1</sub>
to a node
<italic>n</italic>
<sub>2</sub>
if and only if the (
<italic>k</italic>
− 1)-length-suffix of
<italic>n</italic>
<sub>1</sub>
equals the (
<italic>k</italic>
− 1)-length-prefix of
<italic>n</italic>
<sub>2</sub>
(
<xref ref-type="fig" rid="pgen.1007758.g001">Fig 1A</xref>
).</p>
<p>These graphs can be compacted into cDBGs by merging linear paths (sequences of nodes not linked to more than two other nodes) into a single node referred to as a
<italic>unitig</italic>
[
<xref rid="pgen.1007758.ref048" ref-type="bibr">48</xref>
<xref rid="pgen.1007758.ref050" ref-type="bibr">50</xref>
] (
<xref ref-type="fig" rid="pgen.1007758.g001">Fig 1C</xref>
). Compaction yields a graph with locally optimal resolution: regions of the genome which are conserved across individuals are represented by long unitigs, while regions which are highly variable are fractioned into shorter unitigs (
<xref ref-type="supplementary-material" rid="pgen.1007758.s001">S1 Fig</xref>
).</p>
</sec>
<sec id="sec012">
<title>Representing strains by their unitig content (step 1)</title>
<sec id="sec013">
<title>cDBG construction</title>
<p>We build a single DBG from all genomes given as input using the GATB C++ library [
<xref rid="pgen.1007758.ref051" ref-type="bibr">51</xref>
]. We start from contigs rather than reads and, consequently, we do not need to filter out low abundance k-mers, allowing for the exploration of any variation present in the set of input genomes. We then compact the DBG using a graph traversal algorithm, which identifies all linear paths in the DBG—each forming a unitig in the cDBG. During this step, we also associate each k-mer index to its corresponding unitig index in the cDBG.</p>
<p>There is no general rule for choosing the ideal k-mer length as it depends on many factors, including the assembly quality, complexity of the input genomes, or presence of repeats. High values of
<italic>k</italic>
lead to haplotypes containing multiple SNPs instead of distinct single SNPs, if these SNPs are separated by less than
<italic>k</italic>
bases. As
<italic>k</italic>
increases, the k-mer-defined haplotypes also become more specific to a genome sub-population, leading to a loss of power to detect genotype to phenotype associations. Low values of
<italic>k</italic>
, on the other hand, produce highly connected sets of non-specific k-mers. In particular, any repeated region with at least
<italic>k</italic>
bases may create a cycle in the DBG (
<xref ref-type="fig" rid="pgen.1007758.g004">Fig 4</xref>
). We use
<italic>k</italic>
= 31 by default, as it produced the best performance to retrieve known markers of
<italic>P. aeruginosa</italic>
resistance to amikacin and levofloxacin (
<xref ref-type="fig" rid="pgen.1007758.g005">Fig 5</xref>
). We found DBGWAS results to be robust to small variations of
<italic>k</italic>
between 21 and 41. Similar graph structures were generated whatever the tested value of
<italic>k</italic>
for the clonal
<italic>M. tuberculosis</italic>
species (
<xref ref-type="supplementary-material" rid="pgen.1007758.s007">S7 Fig</xref>
). More variability was observed for
<italic>P. aeruginosa</italic>
resistance to amikacin, which involves more complex resistance mechanisms (
<xref ref-type="supplementary-material" rid="pgen.1007758.s008">S8 Fig</xref>
).</p>
<fig id="pgen.1007758.g004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pgen.1007758.g004</object-id>
<label>Fig 4</label>
<caption>
<title>Effect of
<italic>k</italic>
on the graph topology.</title>
<p>A cDBG was built from the
<italic>P. aeruginosa gyrA</italic>
gene sequences from several strains. When
<italic>k</italic>
is small, k-mers are highly repeated, which generate numerous loops. As
<italic>k</italic>
increases, k-mer sequences become more specific and the graph gets more linear. For large values of
<italic>k</italic>
, few k-mers are shared by all the strains, and the linear path thickens into parallel paths belonging to variable strain populations.</p>
</caption>
<graphic xlink:href="pgen.1007758.g004"></graphic>
</fig>
<fig id="pgen.1007758.g005" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pgen.1007758.g005</object-id>
<label>Fig 5</label>
<caption>
<title>Choice of
<italic>k</italic>
.</title>
<p>True positive
<italic>versus</italic>
false positive curves for several values of
<italic>k</italic>
for both amikacin and levofloxacin resistance phenotypes. True positives are unitigs mapping to genuine variants described in resistance databases for the studied drugs [
<xref rid="pgen.1007758.ref007" ref-type="bibr">7</xref>
]. In both cases, the value of
<italic>k</italic>
leading to the best AUC is
<italic>k</italic>
= 31.</p>
</caption>
<graphic xlink:href="pgen.1007758.g005"></graphic>
</fig>
</sec>
<sec id="sec014">
<title>Unitig presence across genomes</title>
<p>Each genome is represented by a vector of presence/absence of each unitig in the cDBG. To do so, we query the unitig associated to each k-mer in a given genome. This procedure is efficient because it relies on constant time operations. Firstly, we use GATB’s Minimal Perfect Hash Function (MPHF) [
<xref rid="pgen.1007758.ref052" ref-type="bibr">52</xref>
] to retrieve the index of a given k-mer, and then we use the previously computed association between k-mer and unitig indices to know which unitigs the given genome contains. Since these two operations take constant time, producing this vector representation for a genome takes linear time on the size of the genome. It is important to note that the GATB’s MPHF can be successfully applied here because we always use the same list of k-mers, i.e., after building the DBG, the set of k-mers is fixed and not updated, and because we always query k-mers that are guaranteed to be in the DBG (since we do not filter out any k-mer).</p>
<p>The unitig description on all the input genomes is stored into a matrix
<italic>U</italic>
:
<disp-formula id="pgen.1007758.e002">
<alternatives>
<graphic xlink:href="pgen.1007758.e002.jpg" id="pgen.1007758.e002g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M2">
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>{</mml:mo>
<mml:mtable>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mtext>,</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>if</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>the</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mi>j</mml:mi>
<mml:mtext>-th</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>unitig</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>is</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>present</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>in</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>the</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mi>i</mml:mi>
<mml:mo>-</mml:mo>
<mml:mtext>th</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>input</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>genome</mml:mtext>
<mml:mo>;</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mtext>,</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>otherwise.</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</alternatives>
</disp-formula>
</p>
<p>We then transform the matrix
<italic>U</italic>
into
<italic>Z</italic>
, which represents the minor allele description, in terms of presence [
<xref rid="pgen.1007758.ref005" ref-type="bibr">5</xref>
]:
<italic>Z</italic>
is identical to
<italic>U</italic>
except for columns with a mean larger than 0.5, which are complemented:
<italic>Z</italic>
<sub>
<italic>j</italic>
</sub>
= 1 −
<italic>U</italic>
<sub>
<italic>j</italic>
</sub>
for these columns.</p>
<p>We then restrict
<italic>Z</italic>
to its set of unique columns. If several unitigs have the same minor allele presence pattern, then they will be represented by a single column. Keeping duplicates would lead to performing the same statistical test several times. Finally, we filter out columns whose average is below 0.01—the user can specify this threshold using the -
<monospace>maf</monospace>
option. We denote the de-duplicated, filtered matrix of patterns by
<italic>X</italic>
.</p>
<p>Importantly, both k-mers and unitigs lead to the same set of distinct patterns across the genomes. Indeed, every unitig represents (at least) one k-mer, and conversely every k-mer is represented by one (single) unitig. When de-duplicated, the two representations therefore lead to the same set of patterns to be tested for association with the phenotype.</p>
</sec>
</sec>
<sec id="sec015">
<title>Testing unitigs for association with the phenotype (step 2)</title>
<p>Human GWAS literature extensively discusses how testing procedures can result in spurious associations if the effect of the population structure is not taken into account [
<xref rid="pgen.1007758.ref053" ref-type="bibr">53</xref>
<xref rid="pgen.1007758.ref055" ref-type="bibr">55</xref>
]. Population structures can be strong in bacteria because of their clonality [
<xref rid="pgen.1007758.ref005" ref-type="bibr">5</xref>
,
<xref rid="pgen.1007758.ref006" ref-type="bibr">6</xref>
,
<xref rid="pgen.1007758.ref056" ref-type="bibr">56</xref>
,
<xref rid="pgen.1007758.ref057" ref-type="bibr">57</xref>
]. An additional performance analysis comparing several models for population structure, on both simulated and real data, showed that correcting for population structure using LMMs is often preferable to using a fixed effect correction or not correcting at all (
<xref ref-type="supplementary-material" rid="pgen.1007758.s018">S1 Appendix</xref>
).</p>
<p>We thus rely on the bugwas method [
<xref rid="pgen.1007758.ref005" ref-type="bibr">5</xref>
], which uses the linear mixed model (LMM) implemented in the GEMMA library [
<xref rid="pgen.1007758.ref058" ref-type="bibr">58</xref>
], to test for association with phenotypes while correcting for the population structure. This method also offers the possibility to test for lineage effects, by calculating p-values for association between the columns of the matrix representing the population structure, and the phenotype [
<xref rid="pgen.1007758.ref005" ref-type="bibr">5</xref>
]. DBGWAS optionally provides bugwas lineage effect plots when the user specifies a phylogenetic tree using the -
<monospace>newick</monospace>
option. An example of the generated figures is available at
<ext-link ext-link-type="uri" xlink:href="http://pbil.univ-lyon1.fr/datasets/DBGWAS_support/full_dataset_visualization/">http://pbil.univ-lyon1.fr/datasets/DBGWAS_support/full_dataset_visualization/</ext-link>
.</p>
<p>Formally, the LMM represents the distribution of the binarized phenotype
<italic>Y</italic>
<sub>
<italic>i</italic>
</sub>
, given the
<italic>j</italic>
-th minor allele pattern
<italic>X</italic>
<sub>
<italic>ij</italic>
</sub>
and the population structure represented by a set of factors
<inline-formula id="pgen.1007758.e003">
<alternatives>
<graphic xlink:href="pgen.1007758.e003.jpg" id="pgen.1007758.e003g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M3">
<mml:mrow>
<mml:mi>W</mml:mi>
<mml:mo></mml:mo>
<mml:msup>
<mml:mi mathvariant="double-struck">R</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo></mml:mo>
<mml:mi>p</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
, by:
<disp-formula id="pgen.1007758.e004">
<alternatives>
<graphic xlink:href="pgen.1007758.e004.jpg" id="pgen.1007758.e004g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M4">
<mml:mrow>
<mml:msub>
<mml:mi>Y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>β</mml:mi>
<mml:mo>+</mml:mo>
<mml:msubsup>
<mml:mi>W</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>T</mml:mi>
</mml:msubsup>
<mml:mi>α</mml:mi>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>ε</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mspace width="1.em"></mml:mspace>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
<label>(1)</label>
</disp-formula>
<italic>β</italic>
is the fixed effect of the tested candidate on the phenotype,
<inline-formula id="pgen.1007758.e005">
<alternatives>
<graphic xlink:href="pgen.1007758.e005.jpg" id="pgen.1007758.e005g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M5">
<mml:mrow>
<mml:mi>α</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mo>(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>σ</mml:mi>
<mml:mi>a</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
,
<inline-formula id="pgen.1007758.e006">
<alternatives>
<graphic xlink:href="pgen.1007758.e006.jpg" id="pgen.1007758.e006g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M6">
<mml:mrow>
<mml:msubsup>
<mml:mi>σ</mml:mi>
<mml:mi>a</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
<mml:mo>></mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
is the random effect of the population structure, and
<inline-formula id="pgen.1007758.e007">
<alternatives>
<graphic xlink:href="pgen.1007758.e007.jpg" id="pgen.1007758.e007g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M7">
<mml:mrow>
<mml:msub>
<mml:mi>ε</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mover>
<mml:mo></mml:mo>
<mml:mtext>iid</mml:mtext>
</mml:mover>
<mml:mi mathvariant="script">N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>σ</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
are the residuals with variance
<italic>σ</italic>
<sup>2</sup>
> 0.
<italic>W</italic>
is estimated from the
<italic>Z</italic>
matrix, which includes duplicate columns representing both core and accessory genome. More precisely, denoting
<italic>Z</italic>
=
<italic>USV</italic>
<sup></sup>
the singular value decomposition of
<italic>Z</italic>
, we use
<italic>W</italic>
=
<italic>US</italic>
.</p>
<p>We test
<italic>H</italic>
<sub>0</sub>
:
<italic>β</italic>
= 0 versus
<italic>H</italic>
<sub>1</sub>
:
<italic>β</italic>
≠ 0 in
<xref ref-type="disp-formula" rid="pgen.1007758.e004">Eq 1</xref>
for each pattern using a likelihood ratio procedure producing p-values and maximum likelihood estimates
<inline-formula id="pgen.1007758.e008">
<alternatives>
<graphic xlink:href="pgen.1007758.e008.jpg" id="pgen.1007758.e008g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M8">
<mml:mover accent="true">
<mml:mi>β</mml:mi>
<mml:mo>^</mml:mo>
</mml:mover>
</mml:math>
</alternatives>
</inline-formula>
. To tackle the situation of multiple testing caused by the high number of tested patterns, we compute q-values, which are the Benjamini-Hochberg transformed p-values controlling for false discovery rate (FDR) [
<xref rid="pgen.1007758.ref059" ref-type="bibr">59</xref>
].</p>
</sec>
<sec id="sec016">
<title>Interpretation of significant unitigs (step 3)</title>
<p>The LMM is used to identify de-duplicated minor allele presence patterns significantly associated with the phenotype at a chosen FDR level. While the testing step is done at the pattern level, the interpretation of the selected features is done at the unitig level. As a result of the de-duplication procedure, a given pattern may correspond to several distinct unitigs. To faithfully interpret the results, all the unitigs corresponding to the significant patterns are retrieved and are assigned the q-value of their pattern. We now show how the initial cDBG can be used in the interpretation step.</p>
<sec id="sec017">
<title>Significance threshold</title>
<p>The interpretation step focuses on the unitigs with the lowest q-values. These unitigs are indeed used to build the resulting annotated subgraphs. The unitig selection can be either based on the FDR (q-value threshold) or on a number of presence/absence patterns ordered by increasing q-values. Practically, this is done in DBGWAS using a Significant Features Filter (SFF). For a selection based on a FDR threshold, the SFF value is set between 0 and 1, while any integer value > 1 defines the number of patterns to consider.</p>
<p>In our experiments, we choose not to apply a fixed FDR threshold, even though DBGWAS offers this option. Different datasets lead to different q-values, even by several orders of magnitude, and a single FDR threshold would lead to selecting a large number of unitigs generating more than 1,000 subgraphs on some of them (e.g.
<italic>S. aureus</italic>
ciprofloxacin) as shown in
<xref ref-type="supplementary-material" rid="pgen.1007758.s017">S8 Table</xref>
. Instead, we retain the 100 patterns with lowest q-values. Although arbitrary, this choice is tractable for all datasets and provides satisfactory results in our experiments. It does not provide and explicit control of the FDR: only the q-value provides an estimation of the proportion of false discoveries incurred when considering patterns below this value. Checking the q-values of the selected unitigs is therefore essential to assess their significance. If the default SFF = 100 is not satisfactory, it is also possible to re-run the third step only, with a more suitable SFF value.</p>
</sec>
<sec id="sec018">
<title>Graph neighbourhoods</title>
<p>We define the neighbourhood of each significant unitig
<italic>u</italic>
(defined by the
<italic>SFF</italic>
) as the set of unitigs whose shortest path to
<italic>u</italic>
has at most
<italic>ne</italic>
= 5 edges. Users can modify the
<italic>ne</italic>
value using the -
<monospace>nh</monospace>
option. The objects returned by DBGWAS are the connected components of the graph induced by the neighbourhoods of all significant unitigs in the cDBG. As illustrated in
<xref ref-type="fig" rid="pgen.1007758.g006">Fig 6</xref>
, nearby significant unitigs might belong to the same connected component, so this process groups unitigs which are likely to be located closely in the genomes. We refer to the connected components as
<italic>subgraphs</italic>
in the
<xref ref-type="sec" rid="sec002">Results</xref>
section.</p>
<fig id="pgen.1007758.g006" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pgen.1007758.g006</object-id>
<label>Fig 6</label>
<caption>
<title>Subgraphs induced by the neighbourhood of significantly associated unitigs.</title>
<p>In this example, a neighbourhood of size
<italic>ne</italic>
= 2 was used: any unitig distant up to 2 edges from a significant unitig is retrieved to define its neighbourhood. Neighbourhoods are merged if they share at least one node, e.g. the neighbourhoods of
<italic>U</italic>
<sub>1</sub>
and
<italic>U</italic>
<sub>2</sub>
are merged because they share
<italic>N</italic>
<sub>6</sub>
, and will be represented in a single subgraph.</p>
</caption>
<graphic xlink:href="pgen.1007758.g006"></graphic>
</fig>
<p>The
<italic>SFF</italic>
value can be tuned to optimise the number and size of the output subgraphs. It has no impact on subgraphs describing SNPs in core sequences (
<xref ref-type="supplementary-material" rid="pgen.1007758.s002">S2 Fig</xref>
). On the other hand, when significant unitigs map to different regions of a single MGE, such as a plasmid, several subgraphs are generated but can be gathered into a single subgraph by increasing the
<italic>SFF</italic>
threshold (
<xref ref-type="supplementary-material" rid="pgen.1007758.s004">S4 Fig</xref>
). When significant unitigs map to several distinct mobile regions, which can be found in different contexts (transposon, integron, etc.) at the population level, the resulting subgraph can become very large and highly branching: decreasing the
<italic>SFF</italic>
threshold allows to select the few most significant unitigs, generating a subgraph focusing on the most relevant region (
<xref ref-type="supplementary-material" rid="pgen.1007758.s006">S6 Fig</xref>
). Reducing the graph complexity can also be done by decreasing the
<italic>ne</italic>
value, using the -
<monospace>nh</monospace>
option.</p>
</sec>
<sec id="sec019">
<title>Representing metadata with coloured DBGs</title>
<p>The subgraphs are enriched with metadata to make their interpretation easier. We use the node size to represent allele frequencies,
<italic>i.e.</italic>
, the proportion of genomes containing the unitig sequence. We describe the effect
<italic>β</italic>
of each unitig as estimated by the LMM using colours, in the spirit of the coloured DBGs [
<xref rid="pgen.1007758.ref019" ref-type="bibr">19</xref>
]. Colours are continuously interpolated between red for unitigs with a strong positive effect and blue for those with a strong negative effect.</p>
</sec>
<sec id="sec020">
<title>Annotating the subgraphs</title>
<p>DBGWAS can optionally integrate an automated annotation step using the Blast suite [
<xref rid="pgen.1007758.ref060" ref-type="bibr">60</xref>
] (version 2.6.0+) on local user-defined protein (-
<monospace>pt-db</monospace>
option) or nucleic acid (-
<monospace>nt-db</monospace>
option) sequence databases. We annotate the subgraphs of interest by blasting each unitig sequence to the available databases. Users can then easily retrieve the annotations which are the most supported by the nodes in the subgraph, or with the lowest E-value. Importantly, DBGWAS works with any nucleotide or protein Fasta files as annotation databases straight away. However, users can customize the annotation databases by changing the Fasta sequences headers to make DBGWAS results more interpretable. A common example is compacting the annotation in the summary page by using abbreviations or gene class names, and expanding them to full names in the subgraph page. Other custom fields can also be included in the annotation table by adding specific tags to the headers. A detailed explanation on how to customize annotation databases for DBGWAS can be found in
<ext-link ext-link-type="uri" xlink:href="https://gitlab.com/leoisl/dbgwas/wikis/Customizing-annotation-databases">https://gitlab.com/leoisl/dbgwas/wikis/Customizing-annotation-databases</ext-link>
. We also provide on the DBGWAS website a resistance determinant database built by merging the ResFinder, MEGARes, and ARG-ANNOT databases [
<xref rid="pgen.1007758.ref061" ref-type="bibr">61</xref>
<xref rid="pgen.1007758.ref063" ref-type="bibr">63</xref>
], and a subset of UniProt restricted to bacterial proteins [
<xref rid="pgen.1007758.ref024" ref-type="bibr">24</xref>
]. Subgraphs discussed in the
<xref ref-type="sec" rid="sec002">Results</xref>
section were annotated using these databases.</p>
</sec>
<sec id="sec021">
<title>Interactive visualisation</title>
<p>DBGWAS produces an interactive view of the enriched and annotated subgraphs, allowing the user to explore the graph topology together with information on each node: allele and phenotype frequencies, q-value, estimated effect, and annotation. The view is built using HTML, CSS, and several Javascript libraries, the main one being Cytoscape.js [
<xref rid="pgen.1007758.ref064" ref-type="bibr">64</xref>
]. Results can be shared and visualised in a web browser. As a large number of components can be produced in one run of DBGWAS, we provide a summary page allowing users to preview and filter the subgraphs. Filtering can be based upon the minimum q-value of all unitigs in the component (min
<sub>
<italic>q</italic>
</sub>
), or based on the annotations. A complete description of the DBGWAS interactive interface is available in
<ext-link ext-link-type="uri" xlink:href="https://gitlab.com/leoisl/dbgwas/wikis/DBGWAS-web-based-interactive-visualization">https://gitlab.com/leoisl/dbgwas/wikis/DBGWAS-web-based-interactive-visualization</ext-link>
.</p>
</sec>
<sec id="sec022">
<title>Re-running from
<italic>step 2</italic>
or
<italic>step 3</italic>
</title>
<p>It is possible to re-run a part of the analysis if a first run with the default values was unsatisfactory. The -
<monospace>skip1</monospace>
option allows to re-run from the second step, for instance to compute the lineage effects (adding the -
<monospace>newick</monospace>
option). It is also possible to re-run only the third step by using the -
<monospace>skip2</monospace>
option, for instance when the default
<italic>SFF</italic>
and
<italic>nh</italic>
values generated highly connected graphs, or if the annotation was incomplete.</p>
</sec>
</sec>
<sec id="sec023">
<title>Datasets</title>
<p>We used in our experiments genome sequences from three bacterial species with various degrees of genome plasticity, from more clonal to more plastic:
<italic>M. tuberculosis</italic>
,
<italic>S. aureus</italic>
, and
<italic>P. aeruginosa</italic>
. We also built large datasets with random phenotypes for these 3 species, and used them only for time performance and memory usage assessment. All panels are summarised in
<xref rid="pgen.1007758.t004" ref-type="table">Table 4</xref>
.</p>
<table-wrap id="pgen.1007758.t004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pgen.1007758.t004</object-id>
<label>Table 4</label>
<caption>
<title>Microbial panels.</title>
</caption>
<alternatives>
<graphic id="pgen.1007758.t004g" xlink:href="pgen.1007758.t004"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="center" style="border-bottom:thick" rowspan="1" colspan="1">Species</th>
<th align="left" style="border-bottom:thick" rowspan="1" colspan="1">Genome plasticity</th>
<th align="left" style="border-bottom:thick" rowspan="1" colspan="1">Range of genome length</th>
<th align="center" style="border-bottom:thick" rowspan="1" colspan="1">Panel name</th>
<th align="center" style="border-bottom:thick" rowspan="1" colspan="1">Source</th>
<th align="left" style="border-bottom:thick" rowspan="1" colspan="1">Phenotype</th>
<th align="left" style="border-bottom:thick" rowspan="1" colspan="1">Number of available genomes</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" rowspan="10" colspan="1">
<italic>M. tuberculosis</italic>
</td>
<td align="center" rowspan="10" colspan="1">very low</td>
<td align="center" rowspan="10" colspan="1">4.4 Mbp</td>
<td align="center" rowspan="9" colspan="1">TB</td>
<td align="center" rowspan="9" colspan="1">[
<xref rid="pgen.1007758.ref035" ref-type="bibr">35</xref>
]</td>
<td align="center" rowspan="1" colspan="1">rifampicin</td>
<td align="center" rowspan="1" colspan="1">1,197</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">isoniazid</td>
<td align="center" rowspan="1" colspan="1">1,287</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">ethambutol</td>
<td align="center" rowspan="1" colspan="1">1,041</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">streptomycin</td>
<td align="center" rowspan="1" colspan="1">1,166</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">kanamycin</td>
<td align="center" rowspan="1" colspan="1">671</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">ofloxacin</td>
<td align="center" rowspan="1" colspan="1">696</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">ethionamide</td>
<td align="center" rowspan="1" colspan="1">420</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">MDR</td>
<td align="center" rowspan="1" colspan="1">1,211</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">XDR</td>
<td align="center" rowspan="1" colspan="1">689</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">Large TB</td>
<td align="center" rowspan="1" colspan="1">[
<xref rid="pgen.1007758.ref011" ref-type="bibr">11</xref>
]</td>
<td align="center" rowspan="1" colspan="1">random</td>
<td align="center" rowspan="1" colspan="1">5,000</td>
</tr>
<tr>
<td align="center" rowspan="12" colspan="1">
<italic>S. aureus</italic>
</td>
<td align="center" rowspan="12" colspan="1">low</td>
<td align="center" rowspan="12" colspan="1">2.7-3.1 Mbp</td>
<td align="center" rowspan="11" colspan="1">SA</td>
<td align="center" rowspan="11" colspan="1">[
<xref rid="pgen.1007758.ref030" ref-type="bibr">30</xref>
]</td>
<td align="center" rowspan="1" colspan="1">methicillin</td>
<td align="center" rowspan="1" colspan="1">501</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">ciprofloxacin</td>
<td align="center" rowspan="1" colspan="1">991</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">erythromycin</td>
<td align="center" rowspan="1" colspan="1">991</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">penicillin</td>
<td align="center" rowspan="1" colspan="1">991</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">tetracycline</td>
<td align="center" rowspan="1" colspan="1">991</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">fusidic acid</td>
<td align="center" rowspan="1" colspan="1">991</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">trimethoprim</td>
<td align="center" rowspan="1" colspan="1">323</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">gentamicin</td>
<td align="center" rowspan="1" colspan="1">991</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">rifampin</td>
<td align="center" rowspan="1" colspan="1">991</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">mupirocin</td>
<td align="center" rowspan="1" colspan="1">490</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">vancomycin</td>
<td align="center" rowspan="1" colspan="1">501</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">Large SA</td>
<td align="center" rowspan="1" colspan="1">[
<xref rid="pgen.1007758.ref011" ref-type="bibr">11</xref>
]</td>
<td align="center" rowspan="1" colspan="1">random</td>
<td align="center" rowspan="1" colspan="1">9,000</td>
</tr>
<tr>
<td align="center" rowspan="10" colspan="1">
<italic>P. aeruginosa</italic>
</td>
<td align="center" rowspan="10" colspan="1">high</td>
<td align="center" rowspan="10" colspan="1">5.8-7.6 Mbp</td>
<td align="center" rowspan="9" colspan="1">PA</td>
<td align="center" rowspan="9" colspan="1">[
<xref rid="pgen.1007758.ref065" ref-type="bibr">65</xref>
]</td>
<td align="center" rowspan="1" colspan="1">amikacin</td>
<td align="center" rowspan="1" colspan="1">280</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">levofloxacin</td>
<td align="center" rowspan="1" colspan="1">117</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">meropenem</td>
<td align="center" rowspan="1" colspan="1">280</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">piperacillin</td>
<td align="center" rowspan="1" colspan="1">280</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">colistin</td>
<td align="center" rowspan="1" colspan="1">164</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">polymyxin B</td>
<td align="center" rowspan="1" colspan="1">117</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">chloramphenicol</td>
<td align="center" rowspan="1" colspan="1">103</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">cefepime</td>
<td align="center" rowspan="1" colspan="1">280</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">fosfomycin</td>
<td align="center" rowspan="1" colspan="1">113</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">Large PA</td>
<td align="center" rowspan="1" colspan="1">[
<xref rid="pgen.1007758.ref011" ref-type="bibr">11</xref>
]</td>
<td align="center" rowspan="1" colspan="1">random</td>
<td align="center" rowspan="1" colspan="1">2,500</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="t004fn001">
<p>We selected 3 bacterial species with distinct levels of genome plasticity, and with antibiotic resistance phenotypes available for several drugs. For each species, we also created large datasets by computing random phenotypes for all available genome assemblies from NCBI RefSeq.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<sec id="sec024">
<title>TB panel</title>
<p>
<italic>M. tuberculosis</italic>
(TB) is a human pathogen causing 1.7 million deaths each year [
<xref rid="pgen.1007758.ref066" ref-type="bibr">66</xref>
]. This species is known for its apparent absence of horizontal gene transfer (HGT) and, accordingly, most of the reported resistance determinants are chromosomal mutations [
<xref rid="pgen.1007758.ref067" ref-type="bibr">67</xref>
] in core genes or gene promoters. Intergenic regions are also described to be instrumental in multidrug-resistance (MDR) and extensively drug-resistant (XDR) phenotypes [
<xref rid="pgen.1007758.ref009" ref-type="bibr">9</xref>
]. We use the PATRIC AMR phenotype data, as well as genome assemblies from their resource [
<xref rid="pgen.1007758.ref035" ref-type="bibr">35</xref>
,
<xref rid="pgen.1007758.ref068" ref-type="bibr">68</xref>
]. We thus gather a total of 1302 genomes after filtering based on genome length. Phenotype data include isoniazid, rifampicin, streptomycin, ethambutol, ofloxacin, kanamycin and ethionamide resistance status. Except for the last three drugs, phenotype data are available for more than a thousand genomes. We reconstruct MDR and XDR phenotypes based on the WHO definition [
<xref rid="pgen.1007758.ref066" ref-type="bibr">66</xref>
]. XDR phenotype could only be defined for 689/1302 strains as it required data for at least 4 drugs. Information on how phenotype data and genome assemblies were obtained is available on the PATRIC website.</p>
</sec>
<sec id="sec025">
<title>SA panel</title>
<p>
<italic>S. aureus</italic>
is a human pathogen causing life-threatening infections. It is subject to HGT and many plasmids, mobile elements, and phage sequences have been described in its genome. However, this does not affect the species’ genome size, which is always close to 3 Mbp [
<xref rid="pgen.1007758.ref069" ref-type="bibr">69</xref>
]. Most antibiotic resistance mechanisms are well determined by known variants, as shown in a previous study [
<xref rid="pgen.1007758.ref030" ref-type="bibr">30</xref>
]. This study obtained an overall sensitivity of 97% for predicting 12 phenotypes from rules based on antibiotic marker mapping. We use this study panel of 992 strains obtained by merging their derivation and validation sets.</p>
</sec>
<sec id="sec026">
<title>PA panel</title>
<p>
<italic>P. aeruginosa</italic>
is a ubiquitous bacterial species responsible for various types of infections. It is highly adaptable thanks to its ability to exchange genetic material within and between species [
<xref rid="pgen.1007758.ref070" ref-type="bibr">70</xref>
]. The species accessory genome is particularly important both in terms of size and diversity, and carries more than half of the genetic determinants already described to confer resistance to antimicrobial drugs [
<xref rid="pgen.1007758.ref007" ref-type="bibr">7</xref>
,
<xref rid="pgen.1007758.ref065" ref-type="bibr">65</xref>
,
<xref rid="pgen.1007758.ref071" ref-type="bibr">71</xref>
]. We use a panel of 282 strains, gathered from two collections which mostly include clinical strains: the bioMérieux collection [
<xref rid="pgen.1007758.ref065" ref-type="bibr">65</xref>
] (n = 219) and the Pirnay collection [
<xref rid="pgen.1007758.ref072" ref-type="bibr">72</xref>
] (n = 63). Genome assemblies and categorical phenotypes for 9 antibiotics are available [
<xref rid="pgen.1007758.ref007" ref-type="bibr">7</xref>
]. Binarised phenotypes of amikacin resistance are available on the DBGWAS project page as an example for users.</p>
</sec>
<sec id="sec027">
<title>Phenotype binarisation</title>
<p>Most available phenotypes are categorical, with S, I and R levels, respectively, for susceptible, intermediary, and resistant. We binarise them by assigning a zero value to susceptible strains (S) and one to others (I and R).</p>
</sec>
<sec id="sec028">
<title>Large panels</title>
<p>We built large panels for the three species, in order to analyse the computational performance at a comprehensive scale. To do so, we gathered all genome assemblies of
<italic>M. tuberculosis</italic>
(5,504),
<italic>S. aureus</italic>
(9,331), and
<italic>P. aeruginosa</italic>
(2,802) available on the NCBI RefSeq bacterial genome repository [
<xref rid="pgen.1007758.ref011" ref-type="bibr">11</xref>
], and removed poor quality genomes. For each panel, we generated random binary phenotypes. For a detailed time and memory assessment, we built several sub-panels from these three large panels at size points of 100, 250, 500, 1,000, 2,500, 5,000 and 9,000 genomes. To build these sub-panels, we sampled genomes uniformly from the panels. To take into account the variability among subsamplings, each sub-panel was randomly built 10 times.</p>
</sec>
</sec>
<sec id="sec029">
<title>Resistome-based association studies</title>
<p>We benchmarked DBGWAS against a targeted approach to ensure its ability to retrieve all expected resistance determinants. We thus performed association studies under the same model, using as input a collection of known causal resistance SNPs and genes, defining the resistome.</p>
<p>In this validation study, we used bugwas with the same phenotypes and population structure matrix
<italic>W</italic>
, so the resistome-based analyses and DBGWAS only differ by their input variant matrix (unitigs versus SNPs or genes presence/absence).</p>
<p>For
<italic>P. aeruginosa</italic>
resistome, we use a variant matrix previously described [
<xref rid="pgen.1007758.ref007" ref-type="bibr">7</xref>
], which includes presence/absence of known resistance gene variants, as well as the SNPs called against these reference gene variants. For
<italic>M. tuberculosis</italic>
resistome, we built the variant matrix using the same approach as for
<italic>P. aeruginosa</italic>
[
<xref rid="pgen.1007758.ref007" ref-type="bibr">7</xref>
]: we called the SNPs from a list of 32 known resistance genes and promoters [
<xref rid="pgen.1007758.ref034" ref-type="bibr">34</xref>
,
<xref rid="pgen.1007758.ref067" ref-type="bibr">67</xref>
,
<xref rid="pgen.1007758.ref073" ref-type="bibr">73</xref>
]. The time and memory usage required for the complete analysis (from the mapping of the resistance genes and positions on the genome assemblies to the association study) are provided in Tables
<xref rid="pgen.1007758.t002" ref-type="table">2</xref>
and
<xref rid="pgen.1007758.t003" ref-type="table">3</xref>
.</p>
<p>We sort the annotated features by q-values.
<xref ref-type="supplementary-material" rid="pgen.1007758.s015">S6</xref>
and
<xref ref-type="supplementary-material" rid="pgen.1007758.s016">S7</xref>
Tables summarise all top variants using their q-value ranks, while Tables
<xref rid="pgen.1007758.t002" ref-type="table">2</xref>
and
<xref rid="pgen.1007758.t003" ref-type="table">3</xref>
report the annotations of all variants with a q-value < 0.05 for
<italic>P. aeruginosa</italic>
levofloxacin and
<italic>M. tuberculosis</italic>
streptomycin resistance, respectively.</p>
</sec>
<sec id="sec030">
<title>k-mer-based GWAS</title>
<sec id="sec031">
<title>pyseer</title>
<p>We installed pyseer [
<xref rid="pgen.1007758.ref006" ref-type="bibr">6</xref>
,
<xref rid="pgen.1007758.ref036" ref-type="bibr">36</xref>
] commit ID
<monospace>d17602500a4530b0e68a679ed675fdb12942f56f</monospace>
(9 commits ahead of pyseer v1.1.1). pyseer pipeline is composed of four steps: 1) k-mer counting; 2) population structure estimation; 3) running pyseer; 4) downstream analysis. To use the correct parameters, we followed the pyseer tutorial (
<ext-link ext-link-type="uri" xlink:href="https://pyseer.readthedocs.io/en/master/tutorial.html">https://pyseer.readthedocs.io/en/master/tutorial.html</ext-link>
). For k-mer counting, we used fsm-lite (
<ext-link ext-link-type="uri" xlink:href="https://github.com/nvalimak/fsm-lite">https://github.com/nvalimak/fsm-lite</ext-link>
), filtering out all k-mers with a minor allele frequency smaller than 1%. For population structure estimation, we used Mash v2.0 [
<xref rid="pgen.1007758.ref074" ref-type="bibr">74</xref>
]. To run pyseer, we used 8 cores and a LRT p-value threshold of 0.05. Downstream analysis involved getting the k-mers which exceeded the significance threshold (which can be found using the
<monospace>scripts/count_patterns.py</monospace>
script), sorting them by LRT p-value, blasting them against the two databases presented in the Interpretation of significant unitigs (step 3) subsection, and keeping the best hit for each k-mer. For reproducibility purposes, the scripts we used to run pyseer can be found at
<ext-link ext-link-type="uri" xlink:href="https://gitlab.com/leoisl/DBGWAS_support/tree/master/scripts/pySEER">https://gitlab.com/leoisl/DBGWAS_support/tree/master/scripts/pySEER</ext-link>
.</p>
</sec>
<sec id="sec032">
<title>HAWK</title>
<p>We firstly ran HAWK [
<xref rid="pgen.1007758.ref013" ref-type="bibr">13</xref>
] v0.9.8-beta, as it allows correcting for population structure. Unfortunately, it was unable to find the known causal variants reported for
<italic>P. aeruginosa</italic>
levofloxacin and
<italic>M. tuberculosis</italic>
streptomycin resistances by other methods (see Tables
<xref rid="pgen.1007758.t002" ref-type="table">2</xref>
and
<xref rid="pgen.1007758.t003" ref-type="table">3</xref>
). We therefore kept in our benchmarks an earlier version, HAWK v0.8.3-beta, which presented better qualitative performance for these two evaluated panels. HAWK pipeline is composed of five steps: 1) k-mer counting with a modified version of jellyfish [
<xref rid="pgen.1007758.ref075" ref-type="bibr">75</xref>
]; 2) running HAWK; 3) assembling significant k-mers with ABYSS [
<xref rid="pgen.1007758.ref076" ref-type="bibr">76</xref>
]; 4) getting statistics on the assembled sequences; 5) downstream analysis. The first four steps were performed as described in HAWK’s github page. However, in the first step, we had to remove the lower-count cutoff in
<monospace>jellyfish dump</monospace>
(parameter -
<monospace>L</monospace>
), since we are working with contigs and not reads. The last step was performed similarly as the one described for pyseer. For reproducibility purposes, the scripts we used to run HAWK v0.8.3-beta can be found at
<ext-link ext-link-type="uri" xlink:href="https://gitlab.com/leoisl/DBGWAS_support/tree/master/scripts/HAWK_0_8_3_beta">https://gitlab.com/leoisl/DBGWAS_support/tree/master/scripts/HAWK_0_8_3_beta</ext-link>
.</p>
</sec>
</sec>
</sec>
<sec sec-type="supplementary-material" id="sec033">
<title>Supporting information</title>
<supplementary-material content-type="local-data" id="pgen.1007758.s001">
<label>S1 Fig</label>
<caption>
<title>Alignment to a reference (when possible), cDBG, and k-mers obtained for similar (A) and very polymorphic genomes (B).</title>
<p>In the first case, the 3 loci represented as polymorphic in the alignment lead to 3 bubble patterns in the cDBG, and numerous redundant k-mers. In the second case, genomes are so polymorphic that an alignment is not possible. The cDBG summarizes well the common regions and the links between them, while the collection of unique k-mers still contains redundancy.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s001.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s002">
<label>S2 Fig</label>
<caption>
<title>Effect of
<italic>SFF</italic>
on the top subgraphs generated for
<italic>S. aureus</italic>
ciprofloxacin resistance.</title>
<p>Annotation of the first subgraphs is strictly conserved (red for
<italic>parC</italic>
, green for
<italic>gyrA</italic>
, yellow for
<italic>norA</italic>
promoter region, blue for noncoding between
<italic>glmM</italic>
and
<italic>fmtB</italic>
and violet for transposase flanking regions).</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s002.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s003">
<label>S3 Fig</label>
<caption>
<title>Effect of
<italic>SFF</italic>
on the top subgraphs generated for
<italic>S. aureus</italic>
methicillin resistance.</title>
<p>Only one subgraph, containing the
<italic>mecA</italic>
gene (highlighted in red) is generated for lower
<italic>SFF</italic>
values. Then several regions of the SCC
<italic>mec</italic>
cassette appear for
<italic>SFF</italic>
= 70, and are aggregated into a single subgraph for
<italic>SFF</italic>
≥ 150. Green subgraphs do not concern the
<italic>mecA</italic>
MGE.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s003.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s004">
<label>S4 Fig</label>
<caption>
<title>Effect of
<italic>SFF</italic>
on the top subgraphs generated for
<italic>S. aureus</italic>
penicillin resistance.</title>
<p>Green subgraphs do not concern the
<italic>blaZ</italic>
MGE. Annotations are ordered by number of nodes carrying it. Yellow, orange and pink highlight
<italic>blaZ</italic>
,
<italic>blaR1</italic>
and
<italic>blaI</italic>
, respectively.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s004.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s005">
<label>S5 Fig</label>
<caption>
<title>Effect of
<italic>SFF</italic>
on the top subgraphs generated for
<italic>S. aureus</italic>
erythromycin resistance.</title>
<p>Only one subgraph, describing the
<italic>ermC</italic>
and its plasmid is outputted when
<italic>SFF</italic>
< 200. Green subgraphs do not concern the
<italic>ermC</italic>
MGE.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s005.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s006">
<label>S6 Fig</label>
<caption>
<title>Effect of
<italic>SFF</italic>
on the top subgraphs generated for
<italic>P. aeruginosa</italic>
amikacin resistance.</title>
<p>Nodes corresponding to
<italic>aac(6’)</italic>
gene are shown in a blue frame. When the
<italic>SFF</italic>
parameter increases, these nodes aggregate to others genes found at least once close to
<italic>aac(6’)</italic>
. The annotation of the following subgraphs are well conserved (same color legend as in
<xref ref-type="supplementary-material" rid="pgen.1007758.s008">S8 Fig</xref>
).</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s006.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s007">
<label>S7 Fig</label>
<caption>
<title>Effect of
<italic>k</italic>
on the four first subgraphs obtained for TB rifampicin resistance.</title>
<p>With a
<italic>k</italic>
value varying between 21 and 41, the first 3 subgraphs always have the same ordering, shape and annotation, as well as comparable q-values, although smaller q-values are observed for lower values of
<italic>k</italic>
. The number of significant unitigs per subgraph is also well conserved. The fourth top-rated subgraphs are not always the same: the
<italic>gyrA</italic>
mutation appears at a lower rank when
<italic>k</italic>
is smaller.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s007.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s008">
<label>S8 Fig</label>
<caption>
<title>Effect of
<italic>k</italic>
on the five first subgraphs obtained for
<italic>P. aeruginosa</italic>
amikacin resistance.</title>
<p>When
<italic>k</italic>
varies, the plasmid (yellow) and the mercury reductase and transposase (blue) remain among the five top-rated subgraphs. However,
<italic>k</italic>
has an effect on the aggregation of subgraphs corresponding to different genetic events: the mutation on
<italic>aac(6’)</italic>
gene (blue frame) always appears in the first subgraph but is merged with the large mercury reductase and transposase subgraph for
<italic>k</italic>
= 27, 39 and 41. The order of the subgraphs also varies with
<italic>k</italic>
: up to four ranks for some subgraphs, and others leave the top-5 list.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s008.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s009">
<label>S9 Fig</label>
<caption>
<title>Large scale analysis on computational resources usage.</title>
<p>This figure describes how DBGWAS scales in terms of time and memory usage for large datasets, containing up to 9,000 genomes. The large panels used here are described in the Large panels subsection of the
<xref ref-type="sec" rid="sec010">Methods</xref>
section. To understand better DBGWAS performance behaviour, we present performance curves for each panel at size points of 100, 250, 500, 1,000, 2,500, 5,000 and 9,000 genomes. The executions were done in a cluster, instead of a single machine, and used 8 cores each. In order to reduce subsampling and machine heterogeneity problems, each sub-panel was randomly built 10 times and we present the time and memory usage for all these executions. Although these two measures not only depends on the number of input genomes but also on their length and complexity, this figure allows estimations of the computational resources usage on small and large panels with different genome plasticities.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s009.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s010">
<label>S1 Table</label>
<caption>
<title>DBGWAS time and maximal memory load on a single core.</title>
<p>All runs presented in this table were executed with the default parameters, without optional steps (lineage effect analysis nor annotation of subgraphs), on a single
<italic>Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz</italic>
core. The datasets are described in the Datasets subsection of the
<xref ref-type="sec" rid="sec010">Methods</xref>
section. DBGWAS ran in less than 2,5 hours for all experiments in our benchmark. The maximum memory load (given between parenthesis in the Runtime column) was 11 GB of RAM. The panel size and genome length (given between parenthesis in the Panel column) did not drive alone the running performances; the genome complexity played an important role as well. To view the gain in performance of DBGWAS when running on multiple (8) cores, see
<xref ref-type="supplementary-material" rid="pgen.1007758.s011">S2 Table</xref>
.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s010.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s011">
<label>S2 Table</label>
<caption>
<title>Benchmarking DBGWAS, pyseer and HAWK: Comparison of time and maximal memory load.</title>
<p>The total execution time is presented with the maximal memory consumption in parenthesis, in order of GBs. For pyseer and HAWK, the time and memory for each step is also detailed. All tools were ran on a same machine with 8
<italic>Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz</italic>
cores, 315 GB of RAM and 1 TB of disk space. Each execution used all the 8 available cores. The datasets are described in the Datasets subsection of the
<xref ref-type="sec" rid="sec010">Methods</xref>
section. However, for the three large panels (Large TB, Large SA, and Large PA), here we just chose a random 2,500-genome sub-panel. Moreover, DBGWAS was ran with the default parameters, without optional steps (lineage effect analysis nor annotation of subgraphs). The parameters for pyseer and HAWK were the ones described in the k-mer-based GWAS subsection of the
<xref ref-type="sec" rid="sec010">Methods</xref>
section. We did not consider the time and memory consumed in the last step for these two tools (downstream analysis). The runs taking more than 5 days to finish were interrupted and are shown as
<italic>Timeout</italic>
. The runs that exceeded 1 TB of disk space were interrupted and are shown as
<italic>DQE</italic>
(Disk Quota Exceeded).</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s011.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s012">
<label>S3 Table</label>
<caption>
<title>DBGWAS results for
<italic>M. tuberculosis</italic>
resistance to antibiotics.</title>
<p>For each antibiotic, top subgraphs were reported with their rank, the q-value of the unitig with the lowest q-value (min
<sub>
<italic>q</italic>
</sub>
), the corresponding estimated effect (estimated
<italic>β</italic>
of the linear model) and the number of susceptible (resp. resistant) strains harbouring this unitig (count per phenotype). The type of event represented by the subgraph, its annotation and some comments and references on this annotation were also provided. Comments were coloured if the annotation was previously described in antibiotic resistance literature: in green if this description concerned the tested antibiotic, in orange otherwise.</p>
<p>(XLS)</p>
</caption>
<media xlink:href="pgen.1007758.s012.xls">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s013">
<label>S4 Table</label>
<caption>
<title>DBGWAS results for
<italic>S. aureus</italic>
resistance to antibiotics.</title>
<p>For each antibiotic, top subgraphs were reported with their rank, the q-value of the unitig with the lowest q-value (min
<sub>
<italic>q</italic>
</sub>
), the corresponding estimated effect (estimated
<italic>β</italic>
of the linear model) and the number of susceptible (resp. resistant) strains harbouring this unitig (count per phenotype). The type of event represented by the subgraph, its annotation and some comments and references on this annotation were also provided. Comments were coloured if the annotation was previously described in antibiotic resistance literature: in green if this description concerned the tested antibiotic, in orange otherwise.</p>
<p>(XLS)</p>
</caption>
<media xlink:href="pgen.1007758.s013.xls">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s014">
<label>S5 Table</label>
<caption>
<title>DBGWAS results for
<italic>P. aeruginosa</italic>
resistance to antibiotics.</title>
<p>For each antibiotic, top subgraphs were reported with their rank, the q-value of the unitig with the lowest q-value (min
<sub>
<italic>q</italic>
</sub>
), the corresponding estimated effect (estimated
<italic>β</italic>
of the linear model) and the number of susceptible (resp. resistant) strains harbouring this unitig (count per phenotype). The type of event represented by the subgraph, its annotation and some comments and references on this annotation were also provided. Comments were coloured if the annotation was previously described in antibiotic resistance literature: in green if this description concerned the tested antibiotic, in orange otherwise.</p>
<p>(XLS)</p>
</caption>
<media xlink:href="pgen.1007758.s014.xls">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s015">
<label>S6 Table</label>
<caption>
<title>Resistome-based association study results for
<italic>M. tuberculosis</italic>
resistance to antibiotics.</title>
<p>For each antibiotic, the 10 first features most associated to the phenotype were reported, with their rank, q-value, and estimated effect (estimated
<italic>β</italic>
of the linear model). The type of targeted variant, with its gene annotation were also provided. Comments were coloured if the annotation was previously described in antibiotic resistance literature: in green if this description concerned the tested antibiotic, in orange otherwise. The last column presents the corresponding subgraphs found by DBGWAS, with their rank and min
<sub>
<italic>q</italic>
</sub>
.</p>
<p>(XLS)</p>
</caption>
<media xlink:href="pgen.1007758.s015.xls">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s016">
<label>S7 Table</label>
<caption>
<title>Resistome-based association study results for
<italic>P. aeruginosa</italic>
resistance to antibiotics.</title>
<p>For each antibiotic, the 10 first features most associated to the phenotype were reported, with their rank, q-value, and estimated effect (estimated
<italic>β</italic>
of the linear model). The type of targeted variant, with its gene annotation were also provided. Comments were coloured if the annotation was previously described in antibiotic resistance literature: in green if this description concerned the tested antibiotic, in orange otherwise. The last column presents the corresponding subgraphs found by DBGWAS, with their min
<sub>
<italic>q</italic>
</sub>
.</p>
<p>(XLS)</p>
</caption>
<media xlink:href="pgen.1007758.s016.xls">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s017">
<label>S8 Table</label>
<caption>
<title>Number of subgraphs generated using different significance thresholds.</title>
<p>This table shows the number of subgraphs generated when defining the significant unitigs as the ones with the 100 lowest q-values (default
<italic>SFF</italic>
= 100, ‘top 100’) or when using a 5% false discovery rate (FDR) threshold (
<italic>SFF</italic>
= 0.05, ‘5% FDR’). Different datasets lead to different q-values, even by several orders of magnitude. For instance, a single FDR threshold leads to selecting a large number of unitigs generating several hundreds subgraphs for SA (
<italic>S. aureus</italic>
) panel.</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s017.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pgen.1007758.s018">
<label>S1 Appendix</label>
<caption>
<title>Evaluation of association models.</title>
<p>(PDF)</p>
</caption>
<media xlink:href="pgen.1007758.s018.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>The authors thank Jean-Baptiste Veyrieras, Sarah Earle, Chieh-Hsi Wu and Daniel Wilson, as well as Jean-Pierre Flandrois, Manolo Gouy, Stéphane Schicklin and Ghislaine Guigon for their insightful comments. The authors also thank the reviewers for their accurate comments and suggestions, which helped to improve the quality of the manuscript.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pgen.1007758.ref001">
<label>1</label>
<mixed-citation publication-type="journal">
<name>
<surname>Farhat</surname>
<given-names>MR</given-names>
</name>
,
<name>
<surname>Shapiro</surname>
<given-names>BJ</given-names>
</name>
,
<name>
<surname>Kieser</surname>
<given-names>KJ</given-names>
</name>
,
<name>
<surname>Sultana</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Jacobson</surname>
<given-names>KR</given-names>
</name>
,
<name>
<surname>Victor</surname>
<given-names>TC</given-names>
</name>
,
<etal>et al</etal>
<article-title>Genomic analysis identifies targets of convergent positive selection in drug-resistant
<italic>Mycobacterium tuberculosis</italic>
</article-title>
.
<source>Nature genetics</source>
.
<year>2013</year>
;
<volume>45</volume>
(
<issue>10</issue>
):
<fpage>1183</fpage>
<lpage>1189</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/ng.2747">10.1038/ng.2747</ext-link>
<pub-id pub-id-type="pmid">23995135</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref002">
<label>2</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sheppard</surname>
<given-names>SK</given-names>
</name>
,
<name>
<surname>Didelot</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Meric</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Torralbo</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Jolley</surname>
<given-names>KA</given-names>
</name>
,
<name>
<surname>Kelly</surname>
<given-names>DJ</given-names>
</name>
,
<etal>et al</etal>
<article-title>Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter</article-title>
.
<source>Proceedings of the national academy of sciences</source>
.
<year>2013</year>
;
<volume>110</volume>
(
<issue>29</issue>
):
<fpage>11923</fpage>
<lpage>11927</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1073/pnas.1305559110">10.1073/pnas.1305559110</ext-link>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref003">
<label>3</label>
<mixed-citation publication-type="journal">
<name>
<surname>Alam</surname>
<given-names>MT</given-names>
</name>
,
<name>
<surname>Petit</surname>
<given-names>RA</given-names>
</name>
,
<name>
<surname>Crispell</surname>
<given-names>EK</given-names>
</name>
,
<name>
<surname>Thornton</surname>
<given-names>TA</given-names>
</name>
,
<name>
<surname>Conneely</surname>
<given-names>KN</given-names>
</name>
,
<name>
<surname>Jiang</surname>
<given-names>Y</given-names>
</name>
,
<etal>et al</etal>
<article-title>Dissecting vancomycin-intermediate resistance in
<italic>Staphylococcus aureus</italic>
using genome-wide association</article-title>
.
<source>Genome biology and evolution</source>
.
<year>2014</year>
;
<volume>6</volume>
(
<issue>5</issue>
):
<fpage>1174</fpage>
<lpage>1185</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/gbe/evu092">10.1093/gbe/evu092</ext-link>
<pub-id pub-id-type="pmid">24787619</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref004">
<label>4</label>
<mixed-citation publication-type="journal">
<name>
<surname>Chewapreecha</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Marttinen</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Croucher</surname>
<given-names>NJ</given-names>
</name>
,
<name>
<surname>Salter</surname>
<given-names>SJ</given-names>
</name>
,
<name>
<surname>Harris</surname>
<given-names>SR</given-names>
</name>
,
<name>
<surname>Mather</surname>
<given-names>AE</given-names>
</name>
,
<etal>et al</etal>
<article-title>Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes</article-title>
.
<source>PLoS genetics</source>
.
<year>2014</year>
;
<volume>10</volume>
(
<issue>8</issue>
):
<fpage>e1004547</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1371/journal.pgen.1004547">10.1371/journal.pgen.1004547</ext-link>
<pub-id pub-id-type="pmid">25101644</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref005">
<label>5</label>
<mixed-citation publication-type="journal">
<name>
<surname>Earle</surname>
<given-names>SG</given-names>
</name>
,
<name>
<surname>Wu</surname>
<given-names>CH</given-names>
</name>
,
<name>
<surname>Charlesworth</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Stoesser</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Gordon</surname>
<given-names>NC</given-names>
</name>
,
<name>
<surname>Walker</surname>
<given-names>TM</given-names>
</name>
,
<etal>et al</etal>
<article-title>Identifying lineage effects when controlling for population structure improves power in bacterial association studies</article-title>
.
<source>Nature microbiology</source>
.
<year>2016</year>
; p.
<fpage>16041</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/nmicrobiol.2016.41">10.1038/nmicrobiol.2016.41</ext-link>
<pub-id pub-id-type="pmid">27572646</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref006">
<label>6</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lees</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Vehkala</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Välimäki</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Harris</surname>
<given-names>SR</given-names>
</name>
,
<name>
<surname>Chewapreecha</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Croucher</surname>
<given-names>NJ</given-names>
</name>
,
<etal>et al</etal>
<article-title>Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes</article-title>
.
<source>Nature communications</source>
.
<year>2016</year>
;
<volume>7</volume>
:
<fpage>12797</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/ncomms12797">10.1038/ncomms12797</ext-link>
<pub-id pub-id-type="pmid">27633831</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref007">
<label>7</label>
<mixed-citation publication-type="journal">
<name>
<surname>Jaillard</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>van Belkum</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Cady</surname>
<given-names>KC</given-names>
</name>
,
<name>
<surname>Creely</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Shortridge</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Blanc</surname>
<given-names>B</given-names>
</name>
,
<etal>et al</etal>
<article-title>Correlation between phenotypic antibiotic susceptibility and the resistome in
<italic>Pseudomonas aeruginosa</italic>
</article-title>
.
<source>International journal of antimicrobial agents</source>
.
<year>2017</year>
;.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.ijantimicag.2017.02.026">10.1016/j.ijantimicag.2017.02.026</ext-link>
<pub-id pub-id-type="pmid">28554735</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref008">
<label>8</label>
<mixed-citation publication-type="journal">
<name>
<surname>Page</surname>
<given-names>AJ</given-names>
</name>
,
<name>
<surname>Cummins</surname>
<given-names>CA</given-names>
</name>
,
<name>
<surname>Hunt</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Wong</surname>
<given-names>VK</given-names>
</name>
,
<name>
<surname>Reuter</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Holden</surname>
<given-names>MT</given-names>
</name>
,
<etal>et al</etal>
<article-title>Roary: rapid large-scale prokaryote pan genome analysis</article-title>
.
<source>Bioinformatics</source>
.
<year>2015</year>
;
<volume>31</volume>
(
<issue>22</issue>
):
<fpage>3691</fpage>
<lpage>3693</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/bioinformatics/btv421">10.1093/bioinformatics/btv421</ext-link>
<pub-id pub-id-type="pmid">26198102</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref009">
<label>9</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zhang</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Zhao</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Fleming</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Lin</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>T</given-names>
</name>
,
<etal>et al</etal>
<article-title>Genome sequencing of 161
<italic>Mycobacterium tuberculosis</italic>
isolates from China identifies genes and intergenic regions associated with drug resistance</article-title>
.
<source>Nature genetics</source>
.
<year>2013</year>
;
<volume>45</volume>
(
<issue>10</issue>
):
<fpage>1255</fpage>
<lpage>1260</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/ng.2735">10.1038/ng.2735</ext-link>
<pub-id pub-id-type="pmid">23995137</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref010">
<label>10</label>
<mixed-citation publication-type="journal">
<name>
<surname>Blair</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Webber</surname>
<given-names>MA</given-names>
</name>
,
<name>
<surname>Baylay</surname>
<given-names>AJ</given-names>
</name>
,
<name>
<surname>Ogbolu</surname>
<given-names>DO</given-names>
</name>
,
<name>
<surname>Piddock</surname>
<given-names>LJ</given-names>
</name>
.
<article-title>Molecular mechanisms of antibiotic resistance</article-title>
.
<source>Nature reviews microbiology</source>
.
<year>2015</year>
;
<volume>13</volume>
(
<issue>1</issue>
):
<fpage>42</fpage>
<lpage>51</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/nrmicro3380">10.1038/nrmicro3380</ext-link>
<pub-id pub-id-type="pmid">25435309</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref011">
<label>11</label>
<mixed-citation publication-type="journal">
<name>
<surname>Haft</surname>
<given-names>DH</given-names>
</name>
,
<name>
<surname>DiCuccio</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Badretdin</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Brover</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Chetvernin</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>O’Neill</surname>
<given-names>K</given-names>
</name>
,
<etal>et al</etal>
<article-title>RefSeq: an update on prokaryotic genome annotation and curation</article-title>
.
<source>Nucleic acids research</source>
.
<year>2017</year>
;
<volume>46</volume>
(
<issue>D1</issue>
):
<fpage>D851</fpage>
<lpage>D860</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/nar/gkx1068">10.1093/nar/gkx1068</ext-link>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref012">
<label>12</label>
<mixed-citation publication-type="journal">
<name>
<surname>Le Bras</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Collin</surname>
<given-names>O</given-names>
</name>
,
<name>
<surname>Monjeaud</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Lacroix</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Rivals</surname>
<given-names>É</given-names>
</name>
,
<name>
<surname>Lemaitre</surname>
<given-names>C</given-names>
</name>
,
<etal>et al</etal>
<article-title>Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads</article-title>
.
<source>GigaScience</source>
.
<year>2016</year>
;
<volume>5</volume>
(
<issue>1</issue>
):
<fpage>1</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/s13742-015-0105-2">10.1186/s13742-015-0105-2</ext-link>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref013">
<label>13</label>
<mixed-citation publication-type="journal">
<name>
<surname>Rahman</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Hallgrímsdóttir</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Eisen</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Pachter</surname>
<given-names>L</given-names>
</name>
.
<article-title>Association mapping from sequencing reads using k-mers</article-title>
.
<source>eLife</source>
.
<year>2018</year>
;
<volume>7</volume>
:
<fpage>e32920</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.7554/eLife.32920">10.7554/eLife.32920</ext-link>
<pub-id pub-id-type="pmid">29897334</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref014">
<label>14</label>
<mixed-citation publication-type="journal">
<name>
<surname>Read</surname>
<given-names>TD</given-names>
</name>
,
<name>
<surname>Massey</surname>
<given-names>RC</given-names>
</name>
.
<article-title>Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology</article-title>
.
<source>Genome medicine</source>
.
<year>2014</year>
;
<volume>6</volume>
(
<issue>11</issue>
):
<fpage>109</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/s13073-014-0109-z">10.1186/s13073-014-0109-z</ext-link>
<pub-id pub-id-type="pmid">25593593</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref015">
<label>15</label>
<mixed-citation publication-type="journal">
<name>
<surname>Power</surname>
<given-names>RA</given-names>
</name>
,
<name>
<surname>Parkhill</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>de Oliveira</surname>
<given-names>T</given-names>
</name>
.
<article-title>Microbial genome-wide association studies: lessons from human GWAS</article-title>
.
<source>Nature reviews genetics</source>
.
<year>2017</year>
;
<volume>18</volume>
(
<issue>1</issue>
):
<fpage>41</fpage>
<lpage>50</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/nrg.2016.132">10.1038/nrg.2016.132</ext-link>
<pub-id pub-id-type="pmid">27840430</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref016">
<label>16</label>
<mixed-citation publication-type="journal">
<name>
<surname>de Bruijn</surname>
<given-names>N</given-names>
</name>
.
<article-title>A combinatorial problem</article-title>
.
<source>Proceedings of the koninklijke nederlandse akademie van wetenschappen Series A</source>
.
<year>1946</year>
;
<volume>49</volume>
(
<issue>7</issue>
):
<fpage>758</fpage>
.</mixed-citation>
</ref>
<ref id="pgen.1007758.ref017">
<label>17</label>
<mixed-citation publication-type="journal">
<name>
<surname>Pevzner</surname>
<given-names>PA</given-names>
</name>
,
<name>
<surname>Tang</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Waterman</surname>
<given-names>MS</given-names>
</name>
.
<article-title>An Eulerian path approach to DNA fragment assembly</article-title>
.
<source>Proceedings of the national academy of sciences</source>
.
<year>2001</year>
;
<volume>98</volume>
(
<issue>17</issue>
):
<fpage>9748</fpage>
<lpage>9753</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1073/pnas.171285098">10.1073/pnas.171285098</ext-link>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref018">
<label>18</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zhang</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Chen</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Yang</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Tang</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Shang</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Shen</surname>
<given-names>B</given-names>
</name>
.
<article-title>A practical comparison of
<italic>de novo</italic>
genome assembly software tools for next-generation sequencing technologies</article-title>
.
<source>PloS one</source>
.
<year>2011</year>
;
<volume>6</volume>
(
<issue>3</issue>
):
<fpage>e17915</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1371/journal.pone.0017915">10.1371/journal.pone.0017915</ext-link>
<pub-id pub-id-type="pmid">21423806</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref019">
<label>19</label>
<mixed-citation publication-type="journal">
<name>
<surname>Iqbal</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Caccamo</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Turner</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Flicek</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>McVean</surname>
<given-names>G</given-names>
</name>
.
<article-title>
<italic>De novo</italic>
assembly and genotyping of variants using colored de Bruijn graphs</article-title>
.
<source>Nature Genetics</source>
.
<year>2012</year>
;
<volume>44</volume>
(
<issue>2</issue>
):
<fpage>226</fpage>
<lpage>232</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/ng.1028">10.1038/ng.1028</ext-link>
<pub-id pub-id-type="pmid">22231483</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref020">
<label>20</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hooper</surname>
<given-names>DC</given-names>
</name>
,
<name>
<surname>Jacoby</surname>
<given-names>GA</given-names>
</name>
.
<article-title>Mechanisms of drug resistance: quinolone resistance</article-title>
.
<source>Annals of the New York academy of sciences</source>
.
<year>2015</year>
;
<volume>1354</volume>
(
<issue>1</issue>
):
<fpage>12</fpage>
<lpage>31</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/nyas.12830">10.1111/nyas.12830</ext-link>
<pub-id pub-id-type="pmid">26190223</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref021">
<label>21</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lowy</surname>
<given-names>FD</given-names>
</name>
.
<article-title>Antimicrobial resistance: the example of
<italic>Staphylococcus aureus</italic>
</article-title>
.
<source>Journal of clinical investigation</source>
.
<year>2003</year>
;
<volume>111</volume>
(
<issue>9</issue>
):
<fpage>1265</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1172/JCI18535">10.1172/JCI18535</ext-link>
<pub-id pub-id-type="pmid">12727914</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref022">
<label>22</label>
<mixed-citation publication-type="journal">
<name>
<surname>Piton</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Petrella</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Delarue</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>André-Leroux</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Jarlier</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Aubry</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
<article-title>Structural insights into the quinolone resistance mechanism of
<italic>Mycobacterium tuberculosis</italic>
DNA gyrase</article-title>
.
<source>PLoS one</source>
.
<year>2010</year>
;
<volume>5</volume>
(
<issue>8</issue>
):
<fpage>e12245</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1371/journal.pone.0012245">10.1371/journal.pone.0012245</ext-link>
<pub-id pub-id-type="pmid">20805881</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref023">
<label>23</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lambert</surname>
<given-names>P</given-names>
</name>
.
<article-title>Mechanisms of antibiotic resistance in
<italic>Pseudomonas aeruginosa</italic>
</article-title>
.
<source>Journal of the royal society of medicine</source>
.
<year>2002</year>
;
<volume>95</volume>
(
<issue>Suppl 41</issue>
):
<fpage>22</fpage>
<pub-id pub-id-type="pmid">12216271</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref024">
<label>24</label>
<mixed-citation publication-type="journal">
<collab>UniProt consortium</collab>
.
<article-title>UniProt: the universal protein knowledgebase</article-title>
.
<source>Nucleic acids research</source>
.
<year>2017</year>
;
<volume>45</volume>
(
<issue>D1</issue>
):
<fpage>D158</fpage>
<lpage>D169</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/nar/gkw1099">10.1093/nar/gkw1099</ext-link>
<pub-id pub-id-type="pmid">27899622</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref025">
<label>25</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lambert</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Ploy</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Courvalin</surname>
<given-names>P</given-names>
</name>
.
<article-title>A spontaneous point mutation in the
<italic>aac(6’)-Ib</italic>
’ gene results in altered substrate specificity of aminoglycoside 6’-N-acetyltransferase of a
<italic>Pseudomonas fluorescens</italic>
strain</article-title>
.
<source>FEMS microbiology letters</source>
.
<year>1994</year>
;
<volume>115</volume>
:
<fpage>297</fpage>
<lpage>304</lpage>
.
<pub-id pub-id-type="pmid">8138142</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref026">
<label>26</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lee</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Cho</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Bang</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Lee</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Bai</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Kim</surname>
<given-names>S</given-names>
</name>
,
<etal>et al</etal>
<article-title>Exclusive mutations related to isoniazid and ethionamide resistance among
<italic>Mycobacterium tuberculosis</italic>
isolates from Korea</article-title>
.
<source>The international journal of tuberculosis and lung disease</source>
.
<year>2000</year>
;
<volume>4</volume>
(
<issue>5</issue>
):
<fpage>441</fpage>
<lpage>447</lpage>
.
<pub-id pub-id-type="pmid">10815738</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref027">
<label>27</label>
<mixed-citation publication-type="journal">
<name>
<surname>Farhat</surname>
<given-names>MR</given-names>
</name>
,
<name>
<surname>Sultana</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Iartchouk</surname>
<given-names>O</given-names>
</name>
,
<name>
<surname>Bozeman</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Galagan</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Sisk</surname>
<given-names>P</given-names>
</name>
,
<etal>et al</etal>
<article-title>Genetic determinants of drug resistance in
<italic>Mycobacterium tuberculosis</italic>
and their diagnostic value</article-title>
.
<source>American journal of respiratory and critical care medicine</source>
.
<year>2016</year>
;
<volume>194</volume>
(
<issue>5</issue>
):
<fpage>621</fpage>
<lpage>630</lpage>
.
<pub-id pub-id-type="pmid">26910495</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref028">
<label>28</label>
<mixed-citation publication-type="journal">
<name>
<surname>Flandrois</surname>
<given-names>JP</given-names>
</name>
,
<name>
<surname>Lina</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Dumitrescu</surname>
<given-names>O</given-names>
</name>
.
<article-title>MUBII-TB-DB: a database of mutations associated with antibiotic resistance in
<italic>Mycobacterium tuberculosis</italic>
</article-title>
.
<source>BMC bioinformatics</source>
.
<year>2014</year>
;
<volume>15</volume>
(
<issue>1</issue>
):
<fpage>107</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/1471-2105-15-107">10.1186/1471-2105-15-107</ext-link>
<pub-id pub-id-type="pmid">24731071</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref029">
<label>29</label>
<mixed-citation publication-type="journal">
<collab>IWG-SCC consortium</collab>
.
<article-title>Classification of staphylococcal cassette chromosome
<italic>mec</italic>
(SCC
<italic>mec</italic>
): guidelines for reporting novel SCC
<italic>mec</italic>
elements</article-title>
.
<source>Antimicrobial agents and chemotherapy</source>
.
<year>2009</year>
;
<volume>53</volume>
(
<issue>12</issue>
):
<fpage>4961</fpage>
<lpage>4967</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1128/AAC.00579-09">10.1128/AAC.00579-09</ext-link>
<pub-id pub-id-type="pmid">19721075</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref030">
<label>30</label>
<mixed-citation publication-type="journal">
<name>
<surname>Gordon</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Price</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Cole</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Everitt</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Morgan</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Finney</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
<article-title>Prediction of
<italic>Staphylococcus aureus</italic>
antimicrobial resistance by whole-genome sequencing</article-title>
.
<source>Journal of clinical microbiology</source>
.
<year>2014</year>
;
<volume>52</volume>
(
<issue>4</issue>
):
<fpage>1182</fpage>
<lpage>1191</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1128/JCM.03117-13">10.1128/JCM.03117-13</ext-link>
<pub-id pub-id-type="pmid">24501024</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref031">
<label>31</label>
<mixed-citation publication-type="journal">
<name>
<surname>Westh</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Hougaard</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Vuust</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Rosdahl</surname>
<given-names>V</given-names>
</name>
.
<article-title>Prevalence of erm gene classes in erythromycin-resistant
<italic>Staphylococcus aureus</italic>
strains isolated between 1959 and 1988</article-title>
.
<source>Antimicrobial agents and chemotherapy</source>
.
<year>1995</year>
;
<volume>39</volume>
(
<issue>2</issue>
):
<fpage>369</fpage>
<lpage>373</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1128/AAC.39.2.369">10.1128/AAC.39.2.369</ext-link>
<pub-id pub-id-type="pmid">7726500</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref032">
<label>32</label>
<mixed-citation publication-type="journal">
<name>
<surname>Benson</surname>
<given-names>DA</given-names>
</name>
,
<name>
<surname>Cavanaugh</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Clark</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Karsch-Mizrachi</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
,
<name>
<surname>Ostell</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
<article-title>GenBank</article-title>
.
<source>Nucleic acids research</source>
.
<year>2012</year>
;
<volume>41</volume>
(
<issue>D1</issue>
):
<fpage>D36</fpage>
<lpage>D42</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/nar/gks1195">10.1093/nar/gks1195</ext-link>
<pub-id pub-id-type="pmid">23193287</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref033">
<label>33</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bi</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Xie</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Tai</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Jiang</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Harrison</surname>
<given-names>EM</given-names>
</name>
,
<etal>et al</etal>
<article-title>A site-specific integrative plasmid found in
<italic>Pseudomonas aeruginosa</italic>
clinical isolate HS87 along with a plasmid carrying an aminoglycoside-resistant gene</article-title>
.
<source>PloS one</source>
.
<year>2016</year>
;
<volume>11</volume>
(
<issue>2</issue>
):
<fpage>e0148367</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1371/journal.pone.0148367">10.1371/journal.pone.0148367</ext-link>
<pub-id pub-id-type="pmid">26841043</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref034">
<label>34</label>
<mixed-citation publication-type="journal">
<name>
<surname>Palomino</surname>
<given-names>JC</given-names>
</name>
,
<name>
<surname>Martin</surname>
<given-names>A</given-names>
</name>
.
<article-title>Drug resistance mechanisms in
<italic>Mycobacterium tuberculosis</italic>
</article-title>
.
<source>Antibiotics</source>
.
<year>2014</year>
;
<volume>3</volume>
(
<issue>3</issue>
):
<fpage>317</fpage>
<lpage>340</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/antibiotics3030317">10.3390/antibiotics3030317</ext-link>
<pub-id pub-id-type="pmid">27025748</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref035">
<label>35</label>
<mixed-citation publication-type="journal">
<name>
<surname>Davis</surname>
<given-names>JJ</given-names>
</name>
,
<name>
<surname>Boisvert</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Brettin</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Kenyon</surname>
<given-names>RW</given-names>
</name>
,
<name>
<surname>Mao</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Olson</surname>
<given-names>R</given-names>
</name>
,
<etal>et al</etal>
<article-title>Antimicrobial resistance prediction in PATRIC and RAST</article-title>
.
<source>Scientific reports</source>
.
<year>2016</year>
;
<volume>6</volume>
:
<fpage>27930</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/srep27930">10.1038/srep27930</ext-link>
<pub-id pub-id-type="pmid">27297683</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref036">
<label>36</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lees</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Galardini</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Bentley</surname>
<given-names>SD</given-names>
</name>
,
<name>
<surname>Weiser</surname>
<given-names>JN</given-names>
</name>
,
<name>
<surname>Corander</surname>
<given-names>J</given-names>
</name>
.
<article-title>pyseer: a comprehensive tool for microbial pangenome-wide association studies</article-title>
.
<source>Bioinformatics</source>
.
<year>2018</year>
; p. bty539.</mixed-citation>
</ref>
<ref id="pgen.1007758.ref037">
<label>37</label>
<mixed-citation publication-type="journal">
<name>
<surname>Traore</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Fissette</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Bastian</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Devleeschouwer</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Portaels</surname>
<given-names>F</given-names>
</name>
.
<article-title>Detection of rifampicin resistance in
<italic>Mycobacterium tuberculosis</italic>
isolates from diverse countries by a commercial line probe assay as an initial indicator of multidrug resistance</article-title>
.
<source>The international journal of tuberculosis and lung disease</source>
.
<year>2000</year>
;
<volume>4</volume>
(
<issue>5</issue>
):
<fpage>481</fpage>
<lpage>484</lpage>
.
<pub-id pub-id-type="pmid">10815743</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref038">
<label>38</label>
<mixed-citation publication-type="journal">
<name>
<surname>Illakkiam</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Shankar</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Ponraj</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Rajendhran</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Gunasekaran</surname>
<given-names>P</given-names>
</name>
.
<article-title>Genome sequencing of a mung bean plant growth promoting strain of
<italic>P</italic>
.
<italic>aeruginosa</italic>
with biocontrol ability</article-title>
.
<source>International journal of genomics</source>
.
<year>2014</year>
;
<volume>2014</volume>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1155/2014/123058">10.1155/2014/123058</ext-link>
<pub-id pub-id-type="pmid">25184130</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref039">
<label>39</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ali-Ahmad</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Fadel</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Sebban-Kreuzer</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Ba</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Pélissier</surname>
<given-names>GD</given-names>
</name>
,
<name>
<surname>Bornet</surname>
<given-names>O</given-names>
</name>
,
<etal>et al</etal>
<article-title>Structural and functional insights into the periplasmic detector domain of the GacS histidine kinase controlling biofilm formation in
<italic>Pseudomonas aeruginosa</italic>
</article-title>
.
<source>Scientific reports</source>
.
<year>2017</year>
;
<volume>7</volume>
(
<issue>1</issue>
):
<fpage>11262</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/s41598-017-11361-3">10.1038/s41598-017-11361-3</ext-link>
<pub-id pub-id-type="pmid">28900144</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref040">
<label>40</label>
<mixed-citation publication-type="journal">
<name>
<surname>Marschall</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Marz</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Abeel</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Dijkstra</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Dutilh</surname>
<given-names>BE</given-names>
</name>
,
<name>
<surname>Ghaffaari</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
<article-title>Computational pan-genomics: status, promises and challenges</article-title>
.
<source>Briefings in bioinformatics</source>
.
<year>2016</year>
; p. bbw089.</mixed-citation>
</ref>
<ref id="pgen.1007758.ref041">
<label>41</label>
<mixed-citation publication-type="journal">
<name>
<surname>Paten</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Novak</surname>
<given-names>AM</given-names>
</name>
,
<name>
<surname>Eizenga</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Garrison</surname>
<given-names>E</given-names>
</name>
.
<article-title>Genome graphs and the evolution of genome inference</article-title>
.
<source>Genome research</source>
.
<year>2017</year>
;
<volume>27</volume>
(
<issue>5</issue>
):
<fpage>665</fpage>
<lpage>676</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1101/gr.214155.116">10.1101/gr.214155.116</ext-link>
<pub-id pub-id-type="pmid">28360232</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref042">
<label>42</label>
<mixed-citation publication-type="journal">
<name>
<surname>Baaijens</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>El Aabidine</surname>
<given-names>AZ</given-names>
</name>
,
<name>
<surname>Rivals</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Schönhuth</surname>
<given-names>A</given-names>
</name>
.
<article-title>
<italic>De novo</italic>
assembly of viral quasispecies using overlap graphs</article-title>
.
<source>Genome research</source>
.
<year>2017</year>
;
<volume>27</volume>
(
<issue>5</issue>
):
<fpage>835</fpage>
<lpage>848</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1101/gr.215038.116">10.1101/gr.215038.116</ext-link>
<pub-id pub-id-type="pmid">28396522</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref043">
<label>43</label>
<mixed-citation publication-type="other">Jaillard M. Fine mapping of antibiotic resistance determinants. PhD thesis. 2018;in preparation.</mixed-citation>
</ref>
<ref id="pgen.1007758.ref044">
<label>44</label>
<mixed-citation publication-type="journal">
<name>
<surname>Dunne</surname>
<given-names>WM</given-names>
<suffix>Jr</suffix>
</name>
,
<name>
<surname>Jaillard</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Rochas</surname>
<given-names>O</given-names>
</name>
,
<name>
<surname>Van Belkum</surname>
<given-names>A</given-names>
</name>
.
<article-title>Microbial genomics and antimicrobial susceptibility testing</article-title>
.
<source>Expert review of molecular diagnostics</source>
.
<year>2017</year>
;
<volume>17</volume>
(
<issue>3</issue>
):
<fpage>257</fpage>
<lpage>269</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/14737159.2017.1283220">10.1080/14737159.2017.1283220</ext-link>
<pub-id pub-id-type="pmid">28093921</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref045">
<label>45</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kos</surname>
<given-names>VN</given-names>
</name>
,
<name>
<surname>Déraspe</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>McLaughlin</surname>
<given-names>RE</given-names>
</name>
,
<name>
<surname>Whiteaker</surname>
<given-names>JD</given-names>
</name>
,
<name>
<surname>Roy</surname>
<given-names>PH</given-names>
</name>
,
<name>
<surname>Alm</surname>
<given-names>RA</given-names>
</name>
,
<etal>et al</etal>
<article-title>The resistome of
<italic>Pseudomonas aeruginosa</italic>
in relationship to phenotypic susceptibility</article-title>
.
<source>Antimicrobial agents and chemotherapy</source>
.
<year>2014</year>
; p. AAC–03954.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1128/AAC.03954-14">10.1128/AAC.03954-14</ext-link>
<pub-id pub-id-type="pmid">25367914</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref046">
<label>46</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bradley</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Gordon</surname>
<given-names>NC</given-names>
</name>
,
<name>
<surname>Walker</surname>
<given-names>TM</given-names>
</name>
,
<name>
<surname>Dunn</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Heys</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Huang</surname>
<given-names>B</given-names>
</name>
,
<etal>et al</etal>
<article-title>Rapid antibiotic-resistance predictions from genome sequence data for
<italic>Staphylococcus aureus</italic>
and
<italic>Mycobacterium tuberculosis</italic>
</article-title>
.
<source>Nature communications</source>
.
<year>2015</year>
;
<volume>6</volume>
:
<fpage>10063</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/ncomms10063">10.1038/ncomms10063</ext-link>
<pub-id pub-id-type="pmid">26686880</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref047">
<label>47</label>
<mixed-citation publication-type="journal">
<name>
<surname>Moradigaravand</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Palm</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Farewell</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Mustonen</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Warringer</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Parts</surname>
<given-names>L</given-names>
</name>
.
<article-title>Precise prediction of antibiotic resistance in
<italic>Escherichia coli</italic>
from full genome sequences</article-title>
.
<source>bioRxiv</source>
.
<year>2018</year>
; p.
<fpage>338194</fpage>
.</mixed-citation>
</ref>
<ref id="pgen.1007758.ref048">
<label>48</label>
<mixed-citation publication-type="journal">
<name>
<surname>Butler</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>MacCallum</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Kleber</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Shlyakhter</surname>
<given-names>IA</given-names>
</name>
,
<name>
<surname>Belmonte</surname>
<given-names>MK</given-names>
</name>
,
<name>
<surname>Lander</surname>
<given-names>ES</given-names>
</name>
,
<etal>et al</etal>
<article-title>ALLPATHS:
<italic>de novo</italic>
assembly of whole-genome shotgun microreads</article-title>
.
<source>Genome research</source>
.
<year>2008</year>
;
<volume>18</volume>
(
<issue>5</issue>
):
<fpage>810</fpage>
<lpage>820</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1101/gr.7337908">10.1101/gr.7337908</ext-link>
<pub-id pub-id-type="pmid">18340039</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref049">
<label>49</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zerbino</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
.
<article-title>Velvet: algorithms for
<italic>de novo</italic>
Short Read Assembly Using De Bruijn Graphs</article-title>
.
<source>Genome research</source>
.
<year>2008</year>
;.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1101/gr.074492.107">10.1101/gr.074492.107</ext-link>
<pub-id pub-id-type="pmid">18349386</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref050">
<label>50</label>
<mixed-citation publication-type="journal">
<name>
<surname>Chikhi</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Limasset</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Medvedev</surname>
<given-names>P</given-names>
</name>
.
<article-title>Compacting de Bruijn graphs from sequencing data quickly and in low memory</article-title>
.
<source>Bioinformatics</source>
.
<year>2016</year>
;
<volume>32</volume>
(
<issue>12</issue>
):
<fpage>i201</fpage>
<lpage>i208</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/bioinformatics/btw279">10.1093/bioinformatics/btw279</ext-link>
<pub-id pub-id-type="pmid">27307618</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref051">
<label>51</label>
<mixed-citation publication-type="journal">
<name>
<surname>Drezen</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Rizk</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Chikhi</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Deltel</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Lemaitre</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Peterlongo</surname>
<given-names>P</given-names>
</name>
,
<etal>et al</etal>
<article-title>GATB: genome assembly & analysis tool box</article-title>
.
<source>Bioinformatics</source>
.
<year>2014</year>
;
<volume>30</volume>
(
<issue>20</issue>
):
<fpage>2959</fpage>
<lpage>2961</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/bioinformatics/btu406">10.1093/bioinformatics/btu406</ext-link>
<pub-id pub-id-type="pmid">24990603</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref052">
<label>52</label>
<mixed-citation publication-type="journal">
<name>
<surname>Limasset</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Rizk</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Chikhi</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Peterlongo</surname>
<given-names>P</given-names>
</name>
.
<article-title>Fast and scalable minimal perfect hashing for massive key sets</article-title>
.
<source>arXiv</source>
<year>2017</year>
;.</mixed-citation>
</ref>
<ref id="pgen.1007758.ref053">
<label>53</label>
<mixed-citation publication-type="journal">
<name>
<surname>Balding</surname>
<given-names>DJ</given-names>
</name>
.
<article-title>A tutorial on statistical methods for population association studies</article-title>
.
<source>Nature reviews genetics</source>
.
<year>2006</year>
;
<volume>7</volume>
(
<issue>10</issue>
):
<fpage>781</fpage>
<lpage>791</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/nrg1916">10.1038/nrg1916</ext-link>
<pub-id pub-id-type="pmid">16983374</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref054">
<label>54</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zhou</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Stephens</surname>
<given-names>M</given-names>
</name>
.
<article-title>Efficient multivariate linear mixed-model algorithms for genome-wide association studies</article-title>
.
<source>Nature methods</source>
.
<year>2014</year>
;
<volume>11</volume>
(
<issue>4</issue>
):
<fpage>407</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/nmeth.2848">10.1038/nmeth.2848</ext-link>
<pub-id pub-id-type="pmid">24531419</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref055">
<label>55</label>
<mixed-citation publication-type="journal">
<name>
<surname>Widmer</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Lippert</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Weissbrod</surname>
<given-names>O</given-names>
</name>
,
<name>
<surname>Fusi</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Kadie</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Davidson</surname>
<given-names>R</given-names>
</name>
,
<etal>et al</etal>
<article-title>Further improvements to linear mixed models for genome-wide association studies</article-title>
.
<source>Scientific reports</source>
.
<year>2014</year>
;
<volume>4</volume>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/srep06874">10.1038/srep06874</ext-link>
<pub-id pub-id-type="pmid">25387525</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref056">
<label>56</label>
<mixed-citation publication-type="journal">
<name>
<surname>Falush</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Bowden</surname>
<given-names>R</given-names>
</name>
.
<article-title>Genome-wide association mapping in bacteria?</article-title>
<source>Trends in microbiology</source>
.
<year>2006</year>
;
<volume>14</volume>
(
<issue>8</issue>
):
<fpage>353</fpage>
<lpage>355</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.tim.2006.06.003">10.1016/j.tim.2006.06.003</ext-link>
<pub-id pub-id-type="pmid">16782339</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref057">
<label>57</label>
<mixed-citation publication-type="journal">
<name>
<surname>Collins</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Didelot</surname>
<given-names>X</given-names>
</name>
.
<article-title>A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination</article-title>
.
<source>PLOS Computational Biology</source>
.
<year>2018</year>
;
<volume>14</volume>
(
<issue>2</issue>
):
<fpage>1</fpage>
<lpage>21</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1371/journal.pcbi.1005958">10.1371/journal.pcbi.1005958</ext-link>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref058">
<label>58</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zhou</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Stephens</surname>
<given-names>M</given-names>
</name>
.
<article-title>Genome-wide efficient mixed-model analysis for association studies</article-title>
.
<source>Nature genetics</source>
.
<year>2012</year>
;
<volume>44</volume>
(
<issue>7</issue>
):
<fpage>821</fpage>
<lpage>824</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/ng.2310">10.1038/ng.2310</ext-link>
<pub-id pub-id-type="pmid">22706312</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref059">
<label>59</label>
<mixed-citation publication-type="journal">
<name>
<surname>Benjamini</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Hochberg</surname>
<given-names>Y</given-names>
</name>
.
<article-title>Controlling the false discovery rate: a practical and powerful approach to multiple testing</article-title>
.
<source>Journal of the royal statistical society Series B (Methodological)</source>
.
<year>1995</year>
; p.
<fpage>289</fpage>
<lpage>300</lpage>
.</mixed-citation>
</ref>
<ref id="pgen.1007758.ref060">
<label>60</label>
<mixed-citation publication-type="journal">
<name>
<surname>Camacho</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Coulouris</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Avagyan</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Ma</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Papadopoulos</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Bealer</surname>
<given-names>K</given-names>
</name>
,
<etal>et al</etal>
<article-title>BLAST+: architecture and applications</article-title>
.
<source>BMC bioinformatics</source>
.
<year>2009</year>
;
<volume>10</volume>
(
<issue>1</issue>
):
<fpage>421</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/1471-2105-10-421">10.1186/1471-2105-10-421</ext-link>
<pub-id pub-id-type="pmid">20003500</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref061">
<label>61</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zankari</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Hasman</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Cosentino</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Vestergaard</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Rasmussen</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Lund</surname>
<given-names>O</given-names>
</name>
,
<etal>et al</etal>
<article-title>Identification of acquired antimicrobial resistance genes</article-title>
.
<source>Journal of antimicrobial chemotherapy</source>
.
<year>2012</year>
;
<volume>67</volume>
(
<issue>11</issue>
):
<fpage>2640</fpage>
<lpage>2644</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/jac/dks261">10.1093/jac/dks261</ext-link>
<pub-id pub-id-type="pmid">22782487</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref062">
<label>62</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lakin</surname>
<given-names>SM</given-names>
</name>
,
<name>
<surname>Dean</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Noyes</surname>
<given-names>NR</given-names>
</name>
,
<name>
<surname>Dettenwanger</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Ross</surname>
<given-names>AS</given-names>
</name>
,
<name>
<surname>Doster</surname>
<given-names>E</given-names>
</name>
,
<etal>et al</etal>
<article-title>MEGARes: an antimicrobial resistance database for high throughput sequencing</article-title>
.
<source>Nucleic acids research</source>
.
<year>2017</year>
;
<volume>45</volume>
(
<issue>D1</issue>
):
<fpage>D574</fpage>
<lpage>D580</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/nar/gkw1009">10.1093/nar/gkw1009</ext-link>
<pub-id pub-id-type="pmid">27899569</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref063">
<label>63</label>
<mixed-citation publication-type="journal">
<name>
<surname>Gupta</surname>
<given-names>SK</given-names>
</name>
,
<name>
<surname>Padmanabhan</surname>
<given-names>BR</given-names>
</name>
,
<name>
<surname>Diene</surname>
<given-names>SM</given-names>
</name>
,
<name>
<surname>Lopez-Rojas</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Kempf</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Landraud</surname>
<given-names>L</given-names>
</name>
,
<etal>et al</etal>
<article-title>ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes</article-title>
.
<source>Antimicrobial agents and chemotherapy</source>
.
<year>2014</year>
;
<volume>58</volume>
(
<issue>1</issue>
):
<fpage>212</fpage>
<lpage>220</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1128/AAC.01310-13">10.1128/AAC.01310-13</ext-link>
<pub-id pub-id-type="pmid">24145532</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref064">
<label>64</label>
<mixed-citation publication-type="journal">
<name>
<surname>Franz</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Lopes</surname>
<given-names>CT</given-names>
</name>
,
<name>
<surname>Huck</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Dong</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Sumer</surname>
<given-names>O</given-names>
</name>
,
<name>
<surname>Bader</surname>
<given-names>GD</given-names>
</name>
.
<article-title>Cytoscape.js: a graph theory library for visualisation and analysis</article-title>
.
<source>Bioinformatics</source>
.
<year>2015</year>
;
<volume>32</volume>
(
<issue>2</issue>
):
<fpage>309</fpage>
<lpage>311</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/bioinformatics/btv557">10.1093/bioinformatics/btv557</ext-link>
<pub-id pub-id-type="pmid">26415722</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref065">
<label>65</label>
<mixed-citation publication-type="journal">
<name>
<surname>van Belkum</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Soriaga</surname>
<given-names>LB</given-names>
</name>
,
<name>
<surname>LaFave</surname>
<given-names>MC</given-names>
</name>
,
<name>
<surname>Akella</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Veyrieras</surname>
<given-names>JB</given-names>
</name>
,
<name>
<surname>Barbu</surname>
<given-names>EM</given-names>
</name>
,
<etal>et al</etal>
<article-title>Phylogenetic distribution of CRISPR-Cas systems in antibiotic-resistant
<italic>Pseudomonas aeruginosa</italic>
</article-title>
.
<source>mBio</source>
.
<year>2015</year>
;
<volume>6</volume>
(
<issue>6</issue>
):
<fpage>e01796</fpage>
<lpage>15</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1128/mBio.01796-15">10.1128/mBio.01796-15</ext-link>
<pub-id pub-id-type="pmid">26604259</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref066">
<label>66</label>
<mixed-citation publication-type="other">Organization WH. Global tuberculosis report. Geneva: WHO Press Release. 2017;Licence: CC BY-NCSA 3.0 IGO.</mixed-citation>
</ref>
<ref id="pgen.1007758.ref067">
<label>67</label>
<mixed-citation publication-type="journal">
<name>
<surname>Gygli</surname>
<given-names>SM</given-names>
</name>
,
<name>
<surname>Borrell</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Trauner</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Gagneux</surname>
<given-names>S</given-names>
</name>
.
<article-title>Antimicrobial resistance in
<italic>Mycobacterium tuberculosis</italic>
: mechanistic and evolutionary perspectives</article-title>
.
<source>FEMS microbiology reviews</source>
.
<year>2017</year>
;
<volume>41</volume>
(
<issue>3</issue>
):
<fpage>354</fpage>
<lpage>373</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/femsre/fux011">10.1093/femsre/fux011</ext-link>
<pub-id pub-id-type="pmid">28369307</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref068">
<label>68</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wattam</surname>
<given-names>AR</given-names>
</name>
,
<name>
<surname>Davis</surname>
<given-names>JJ</given-names>
</name>
,
<name>
<surname>Assaf</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Boisvert</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Brettin</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Bun</surname>
<given-names>C</given-names>
</name>
,
<etal>et al</etal>
<article-title>Improvements to PATRIC, the all-bacterial bioinformatics database and analysis resource center</article-title>
.
<source>Nucleic acids research</source>
.
<year>2016</year>
;
<volume>45</volume>
(
<issue>D1</issue>
):
<fpage>D535</fpage>
<lpage>D542</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/nar/gkw1017">10.1093/nar/gkw1017</ext-link>
<pub-id pub-id-type="pmid">27899627</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref069">
<label>69</label>
<mixed-citation publication-type="journal">
<name>
<surname>Mlynarczyk</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Mlynarczyk</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Jeljaszewicz</surname>
<given-names>J</given-names>
</name>
.
<article-title>The genome of
<italic>Staphylococcus aureus</italic>
: a review</article-title>
.
<source>Zentralblatt für Bakteriologie</source>
.
<year>1998</year>
;
<volume>287</volume>
(
<issue>4</issue>
):
<fpage>277</fpage>
<lpage>314</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/S0934-8840(98)80165-5">10.1016/S0934-8840(98)80165-5</ext-link>
<pub-id pub-id-type="pmid">9638861</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref070">
<label>70</label>
<mixed-citation publication-type="journal">
<name>
<surname>Liu</surname>
<given-names>YY</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Walsh</surname>
<given-names>TR</given-names>
</name>
,
<name>
<surname>Yi</surname>
<given-names>LX</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Spencer</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
<article-title>Emergence of plasmid-mediated colistin resistance mechanism MCR-1 in animals and human beings in China: a microbiological and molecular biological study</article-title>
.
<source>The Lancet infectious diseases</source>
.
<year>2016</year>
;
<volume>16</volume>
(
<issue>2</issue>
):
<fpage>161</fpage>
<lpage>168</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/S1473-3099(15)00424-7">10.1016/S1473-3099(15)00424-7</ext-link>
<pub-id pub-id-type="pmid">26603172</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref071">
<label>71</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kung</surname>
<given-names>VL</given-names>
</name>
,
<name>
<surname>Ozer</surname>
<given-names>EA</given-names>
</name>
,
<name>
<surname>Hauser</surname>
<given-names>AR</given-names>
</name>
.
<article-title>The accessory genome of
<italic>Pseudomonas aeruginosa</italic>
</article-title>
.
<source>Microbiology and molecular biology reviews</source>
.
<year>2010</year>
;
<volume>74</volume>
(
<issue>4</issue>
):
<fpage>621</fpage>
<lpage>641</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1128/MMBR.00027-10">10.1128/MMBR.00027-10</ext-link>
<pub-id pub-id-type="pmid">21119020</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref072">
<label>72</label>
<mixed-citation publication-type="journal">
<name>
<surname>Pirnay</surname>
<given-names>JP</given-names>
</name>
,
<name>
<surname>Bilocq</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Pot</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Cornelis</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Zizi</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Van Eldere</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
<article-title>
<italic>Pseudomonas aeruginosa</italic>
population structure revisited</article-title>
.
<source>PLoS one</source>
.
<year>2009</year>
;
<volume>4</volume>
(
<issue>11</issue>
):
<fpage>e7740</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1371/journal.pone.0007740">10.1371/journal.pone.0007740</ext-link>
<pub-id pub-id-type="pmid">19936230</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref073">
<label>73</label>
<mixed-citation publication-type="journal">
<name>
<surname>Coll</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>McNerney</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Preston</surname>
<given-names>MD</given-names>
</name>
,
<name>
<surname>Guerra-Assunção</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Warry</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Hill-Cawthorne</surname>
<given-names>G</given-names>
</name>
,
<etal>et al</etal>
<article-title>Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences</article-title>
.
<source>Genome medicine</source>
.
<year>2015</year>
;
<volume>7</volume>
(
<issue>1</issue>
):
<fpage>51</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/s13073-015-0164-0">10.1186/s13073-015-0164-0</ext-link>
<pub-id pub-id-type="pmid">26019726</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref074">
<label>74</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ondov</surname>
<given-names>BD</given-names>
</name>
,
<name>
<surname>Treangen</surname>
<given-names>TJ</given-names>
</name>
,
<name>
<surname>Melsted</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Mallonee</surname>
<given-names>AB</given-names>
</name>
,
<name>
<surname>Bergman</surname>
<given-names>NH</given-names>
</name>
,
<name>
<surname>Koren</surname>
<given-names>S</given-names>
</name>
,
<etal>et al</etal>
<article-title>Mash: fast genome and metagenome distance estimation using MinHash</article-title>
.
<source>Genome biology</source>
.
<year>2016</year>
;
<volume>17</volume>
(
<issue>1</issue>
):
<fpage>132</fpage>
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/s13059-016-0997-x">10.1186/s13059-016-0997-x</ext-link>
<pub-id pub-id-type="pmid">27323842</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref075">
<label>75</label>
<mixed-citation publication-type="journal">
<name>
<surname>Marçais</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Kingsford</surname>
<given-names>C</given-names>
</name>
.
<article-title>A fast, lock-free approach for efficient parallel counting of occurrences of k-mers</article-title>
.
<source>Bioinformatics</source>
.
<year>2011</year>
;
<volume>27</volume>
(
<issue>6</issue>
):
<fpage>764</fpage>
<lpage>770</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/bioinformatics/btr011">10.1093/bioinformatics/btr011</ext-link>
<pub-id pub-id-type="pmid">21217122</pub-id>
</mixed-citation>
</ref>
<ref id="pgen.1007758.ref076">
<label>76</label>
<mixed-citation publication-type="journal">
<name>
<surname>Jackman</surname>
<given-names>SD</given-names>
</name>
,
<name>
<surname>Vandervalk</surname>
<given-names>BP</given-names>
</name>
,
<name>
<surname>Mohamadi</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Chu</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Yeo</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Hammond</surname>
<given-names>SA</given-names>
</name>
,
<etal>et al</etal>
<article-title>ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter</article-title>
.
<source>Genome research</source>
.
<year>2017</year>
;
<volume>27</volume>
(
<issue>5</issue>
):
<fpage>768</fpage>
<lpage>777</lpage>
.
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1101/gr.214346.116">10.1101/gr.214346.116</ext-link>
<pub-id pub-id-type="pmid">28232478</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 0010010 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 0010010 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021