Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Weighted lambda superstrings applied to vaccine design

Identifieur interne : 001047 ( Pmc/Corpus ); précédent : 001046; suivant : 001048

Weighted lambda superstrings applied to vaccine design

Auteurs : Luis Martínez ; Martin Milani ; Iker Malaina ; Carmen Álvarez ; Martín-Blas Pérez ; Ildefonso M. De La Fuente

Source :

RBID : PMC:6368308

Abstract

We generalize the notion of λ-superstrings, presented in a previous paper, to the notion of weighted λ-superstrings. This generalization entails an important improvement in the applications to vaccine designs, as it allows epitopes to be weighted by their immunogenicities. Motivated by these potential applications of constructing short weighted λ-superstrings to vaccine design, we approach this problem in two ways. First, we formalize the problem as a combinatorial optimization problem (in fact, as two polynomially equivalent problems) and develop an integer programming (IP) formulation for solving it optimally. Second, we describe a model that also takes into account good pairwise alignments of the obtained superstring with the input strings, and present a genetic algorithm that solves the problem approximately. We apply both algorithms to a set of 169 strings corresponding to the Nef protein taken from patiens infected with HIV-1. In the IP-based algorithm, we take the epitopes and the estimation of the immunogenicities from databases of experimental epitopes. In the genetic algorithm we take as candidate epitopes all 9-mers present in the 169 strings and estimate their immunogenicities using a public bioinformatics tool. Finally, we used several bioinformatic tools to evaluate the properties of the candidates generated by our method, which indicated that we can score high immunogenic λ-superstrings that at the same time present similar conformations to the Nef virus proteins.


Url:
DOI: 10.1371/journal.pone.0211714
PubMed: 30735507
PubMed Central: 6368308

Links to Exploration step

PMC:6368308

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Weighted lambda superstrings applied to vaccine design</title>
<author>
<name sortKey="Martinez, Luis" sort="Martinez, Luis" uniqKey="Martinez L" first="Luis" last="Martínez">Luis Martínez</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Department of Mathematics, University of the Basque Country UPV/EHU, Bilbao, Spain</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Biocruces Bizkaia Health Research Institute, Barakaldo, Spain</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Basque Center for Applied Mathematics BCAM, Bilbao, Spain</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Milani, Martin" sort="Milani, Martin" uniqKey="Milani M" first="Martin" last="Milani">Martin Milani</name>
<affiliation>
<nlm:aff id="aff004">
<addr-line>University of Primorska, UP IAM and UP FAMNIT, Koper, Slovenia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Malaina, Iker" sort="Malaina, Iker" uniqKey="Malaina I" first="Iker" last="Malaina">Iker Malaina</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Department of Mathematics, University of the Basque Country UPV/EHU, Bilbao, Spain</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Biocruces Bizkaia Health Research Institute, Barakaldo, Spain</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Alvarez, Carmen" sort="Alvarez, Carmen" uniqKey="Alvarez C" first="Carmen" last="Álvarez">Carmen Álvarez</name>
<affiliation>
<nlm:aff id="aff005">
<addr-line>IDIVAL Valdecilla Biomedical Research Institute, Santander, Spain</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Perez, Martin Blas" sort="Perez, Martin Blas" uniqKey="Perez M" first="Martín-Blas" last="Pérez">Martín-Blas Pérez</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Department of Mathematics, University of the Basque Country UPV/EHU, Bilbao, Spain</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="M De La Fuente, Ildefonso" sort="M De La Fuente, Ildefonso" uniqKey="M De La Fuente I" first="Ildefonso" last="M. De La Fuente">Ildefonso M. De La Fuente</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Department of Mathematics, University of the Basque Country UPV/EHU, Bilbao, Spain</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff006">
<addr-line>Department of Nutrition, CEBAS-CSIC Institute, Murcia, Spain</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">30735507</idno>
<idno type="pmc">6368308</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6368308</idno>
<idno type="RBID">PMC:6368308</idno>
<idno type="doi">10.1371/journal.pone.0211714</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">001047</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001047</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Weighted lambda superstrings applied to vaccine design</title>
<author>
<name sortKey="Martinez, Luis" sort="Martinez, Luis" uniqKey="Martinez L" first="Luis" last="Martínez">Luis Martínez</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Department of Mathematics, University of the Basque Country UPV/EHU, Bilbao, Spain</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Biocruces Bizkaia Health Research Institute, Barakaldo, Spain</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Basque Center for Applied Mathematics BCAM, Bilbao, Spain</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Milani, Martin" sort="Milani, Martin" uniqKey="Milani M" first="Martin" last="Milani">Martin Milani</name>
<affiliation>
<nlm:aff id="aff004">
<addr-line>University of Primorska, UP IAM and UP FAMNIT, Koper, Slovenia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Malaina, Iker" sort="Malaina, Iker" uniqKey="Malaina I" first="Iker" last="Malaina">Iker Malaina</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Department of Mathematics, University of the Basque Country UPV/EHU, Bilbao, Spain</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>Biocruces Bizkaia Health Research Institute, Barakaldo, Spain</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Alvarez, Carmen" sort="Alvarez, Carmen" uniqKey="Alvarez C" first="Carmen" last="Álvarez">Carmen Álvarez</name>
<affiliation>
<nlm:aff id="aff005">
<addr-line>IDIVAL Valdecilla Biomedical Research Institute, Santander, Spain</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Perez, Martin Blas" sort="Perez, Martin Blas" uniqKey="Perez M" first="Martín-Blas" last="Pérez">Martín-Blas Pérez</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Department of Mathematics, University of the Basque Country UPV/EHU, Bilbao, Spain</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="M De La Fuente, Ildefonso" sort="M De La Fuente, Ildefonso" uniqKey="M De La Fuente I" first="Ildefonso" last="M. De La Fuente">Ildefonso M. De La Fuente</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Department of Mathematics, University of the Basque Country UPV/EHU, Bilbao, Spain</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff006">
<addr-line>Department of Nutrition, CEBAS-CSIC Institute, Murcia, Spain</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>We generalize the notion of λ-superstrings, presented in a previous paper, to the notion of weighted λ-superstrings. This generalization entails an important improvement in the applications to vaccine designs, as it allows epitopes to be weighted by their immunogenicities. Motivated by these potential applications of constructing short weighted λ-superstrings to vaccine design, we approach this problem in two ways. First, we formalize the problem as a combinatorial optimization problem (in fact, as two polynomially equivalent problems) and develop an integer programming (IP) formulation for solving it optimally. Second, we describe a model that also takes into account good pairwise alignments of the obtained superstring with the input strings, and present a genetic algorithm that solves the problem approximately. We apply both algorithms to a set of 169 strings corresponding to the Nef protein taken from patiens infected with HIV-1. In the IP-based algorithm, we take the epitopes and the estimation of the immunogenicities from databases of experimental epitopes. In the genetic algorithm we take as candidate epitopes all 9-mers present in the 169 strings and estimate their immunogenicities using a public bioinformatics tool. Finally, we used several bioinformatic tools to evaluate the properties of the candidates generated by our method, which indicated that we can score high immunogenic λ-superstrings that at the same time present similar conformations to the Nef virus proteins.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Nielsen, M" uniqKey="Nielsen M">M Nielsen</name>
</author>
<author>
<name sortKey="Lund, O" uniqKey="Lund O">O Lund</name>
</author>
<author>
<name sortKey="Buus, S" uniqKey="Buus S">S Buus</name>
</author>
<author>
<name sortKey="Lundegaard, C" uniqKey="Lundegaard C">C Lundegaard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Khan, Am" uniqKey="Khan A">AM Khan</name>
</author>
<author>
<name sortKey="Miotto, O" uniqKey="Miotto O">O Miotto</name>
</author>
<author>
<name sortKey="Heiny, At" uniqKey="Heiny A">AT Heiny</name>
</author>
<author>
<name sortKey="Salmon, J" uniqKey="Salmon J">J Salmon</name>
</author>
<author>
<name sortKey="Srinivasan, Kn" uniqKey="Srinivasan K">KN Srinivasan</name>
</author>
<author>
<name sortKey="Nascimento, Ej" uniqKey="Nascimento E">EJ Nascimento</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Hw" uniqKey="Wang H">HW Wang</name>
</author>
<author>
<name sortKey="Lin, Yc" uniqKey="Lin Y">YC Lin</name>
</author>
<author>
<name sortKey="Pai, Tw" uniqKey="Pai T">TW Pai</name>
</author>
<author>
<name sortKey="Chang, Ht" uniqKey="Chang H">HT Chang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hemmer, B" uniqKey="Hemmer B">B Hemmer</name>
</author>
<author>
<name sortKey="Kondo, T" uniqKey="Kondo T">T Kondo</name>
</author>
<author>
<name sortKey="Gran, B" uniqKey="Gran B">B Gran</name>
</author>
<author>
<name sortKey="Pinilla, C" uniqKey="Pinilla C">C Pinilla</name>
</author>
<author>
<name sortKey="Cortese, I" uniqKey="Cortese I">I Cortese</name>
</author>
<author>
<name sortKey="Pascal, J" uniqKey="Pascal J">J Pascal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharon, J" uniqKey="Sharon J">J Sharon</name>
</author>
<author>
<name sortKey="Rynkiewicz, Mj" uniqKey="Rynkiewicz M">MJ Rynkiewicz</name>
</author>
<author>
<name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
<author>
<name sortKey="Yang, Cy" uniqKey="Yang C">CY Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="El Manzalawy, Y" uniqKey="El Manzalawy Y">Y El-manzalawy</name>
</author>
<author>
<name sortKey="Dobbs, D" uniqKey="Dobbs D">D Dobbs</name>
</author>
<author>
<name sortKey="Honavar, V" uniqKey="Honavar V">V Honavar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Geginat, G" uniqKey="Geginat G">G Geginat</name>
</author>
<author>
<name sortKey="Schenk, S" uniqKey="Schenk S">S Schenk</name>
</author>
<author>
<name sortKey="Skoberne, M" uniqKey="Skoberne M">M Skoberne</name>
</author>
<author>
<name sortKey="Goebel, W" uniqKey="Goebel W">W Goebel</name>
</author>
<author>
<name sortKey="Hof, H" uniqKey="Hof H">H Hof</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Skoberne, M" uniqKey="Skoberne M">M Skoberne</name>
</author>
<author>
<name sortKey="Geginat, G" uniqKey="Geginat G">G Geginat</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, Y" uniqKey="Kim Y">Y Kim</name>
</author>
<author>
<name sortKey="Ponomarenko, J" uniqKey="Ponomarenko J">J Ponomarenko</name>
</author>
<author>
<name sortKey="Zhu, Z" uniqKey="Zhu Z">Z Zhu</name>
</author>
<author>
<name sortKey="Tamang, D" uniqKey="Tamang D">D Tamang</name>
</author>
<author>
<name sortKey="Wang, P" uniqKey="Wang P">P Wang</name>
</author>
<author>
<name sortKey="Greenbaum, J" uniqKey="Greenbaum J">J Greenbaum</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Martinez, L" uniqKey="Martinez L">L Martinez</name>
</author>
<author>
<name sortKey="Milanic, M" uniqKey="Milanic M">M Milanic</name>
</author>
<author>
<name sortKey="Legarreta, L" uniqKey="Legarreta L">L Legarreta</name>
</author>
<author>
<name sortKey="Medvedev, P" uniqKey="Medvedev P">P Medvedev</name>
</author>
<author>
<name sortKey="Malaina, I" uniqKey="Malaina I">I Malaina</name>
</author>
<author>
<name sortKey="De La Fuente, Im" uniqKey="De La Fuente I">IM De la Fuente</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deb, K" uniqKey="Deb K">K Deb</name>
</author>
<author>
<name sortKey="Pratap, A" uniqKey="Pratap A">A Pratap</name>
</author>
<author>
<name sortKey="Agarwal, S" uniqKey="Agarwal S">S Agarwal</name>
</author>
<author>
<name sortKey="Meyarivan, T" uniqKey="Meyarivan T">T Meyarivan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Manzourolajdad, A" uniqKey="Manzourolajdad A">A Manzourolajdad</name>
</author>
<author>
<name sortKey="Gonzalez, M" uniqKey="Gonzalez M">M Gonzalez</name>
</author>
<author>
<name sortKey="Spouge, Jl" uniqKey="Spouge J">JL Spouge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharma, D" uniqKey="Sharma D">D Sharma</name>
</author>
<author>
<name sortKey="Bhattacharya, J" uniqKey="Bhattacharya J">J Bhattacharya</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Alon, N" uniqKey="Alon N">N Alon</name>
</author>
<author>
<name sortKey="Moshkovitz, D" uniqKey="Moshkovitz D">D Moshkovitz</name>
</author>
<author>
<name sortKey="Safra, S" uniqKey="Safra S">S Safra</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schrijver, A" uniqKey="Schrijver A">A Schrijver</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pataki, G" uniqKey="Pataki G">G Pataki</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miller, Ce" uniqKey="Miller C">CE Miller</name>
</author>
<author>
<name sortKey="Tucker, Aw" uniqKey="Tucker A">AW Tucker</name>
</author>
<author>
<name sortKey="Zemlin, Ra" uniqKey="Zemlin R">RA Zemlin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haupt, Rl" uniqKey="Haupt R">RL Haupt</name>
</author>
<author>
<name sortKey="Haupt, Se" uniqKey="Haupt S">SE Haupt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gusfield, D" uniqKey="Gusfield D">D Gusfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nickle, Dc" uniqKey="Nickle D">DC Nickle</name>
</author>
<author>
<name sortKey="Rolland, M" uniqKey="Rolland M">M Rolland</name>
</author>
<author>
<name sortKey="Jensen, Ma" uniqKey="Jensen M">MA Jensen</name>
</author>
<author>
<name sortKey="Pond, Slk" uniqKey="Pond S">SLK Pond</name>
</author>
<author>
<name sortKey="Deng, W" uniqKey="Deng W">W Deng</name>
</author>
<author>
<name sortKey="Seligman, M" uniqKey="Seligman M">M Seligman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vita, R" uniqKey="Vita R">R Vita</name>
</author>
<author>
<name sortKey="Zarebski, L" uniqKey="Zarebski L">L Zarebski</name>
</author>
<author>
<name sortKey="Greenbaum, Ja" uniqKey="Greenbaum J">JA Greenbaum</name>
</author>
<author>
<name sortKey="Emami, H" uniqKey="Emami H">H Emami</name>
</author>
<author>
<name sortKey="Hoof, I" uniqKey="Hoof I">I Hoof</name>
</author>
<author>
<name sortKey="Salimi, N" uniqKey="Salimi N">N Salimi</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rasmussen, M" uniqKey="Rasmussen M">M Rasmussen</name>
</author>
<author>
<name sortKey="Fenoy, E" uniqKey="Fenoy E">E Fenoy</name>
</author>
<author>
<name sortKey="Harndahl, M" uniqKey="Harndahl M">M Harndahl</name>
</author>
<author>
<name sortKey="Kristensen, Ab" uniqKey="Kristensen A">AB Kristensen</name>
</author>
<author>
<name sortKey="Nielsen, Ik" uniqKey="Nielsen I">IK Nielsen</name>
</author>
<author>
<name sortKey="Nielsen, M" uniqKey="Nielsen M">M Nielsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sette, A" uniqKey="Sette A">A Sette</name>
</author>
<author>
<name sortKey="Vitiello, A" uniqKey="Vitiello A">A Vitiello</name>
</author>
<author>
<name sortKey="Reherman, B" uniqKey="Reherman B">B Reherman</name>
</author>
<author>
<name sortKey="Fowler, P" uniqKey="Fowler P">P Fowler</name>
</author>
<author>
<name sortKey="Nayersina, R" uniqKey="Nayersina R">R Nayersina</name>
</author>
<author>
<name sortKey="Kast, Wm" uniqKey="Kast W">WM Kast</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paul, S" uniqKey="Paul S">S Paul</name>
</author>
<author>
<name sortKey="Weiskopf, D" uniqKey="Weiskopf D">D Weiskopf</name>
</author>
<author>
<name sortKey="Angelo, Ma" uniqKey="Angelo M">MA Angelo</name>
</author>
<author>
<name sortKey="Sidney, J" uniqKey="Sidney J">J Sidney</name>
</author>
<author>
<name sortKey="Peters, B" uniqKey="Peters B">B Peters</name>
</author>
<author>
<name sortKey="Sette, A" uniqKey="Sette A">A Sette</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bergmann Leitner, Es" uniqKey="Bergmann Leitner E">ES Bergmann-Leitner</name>
</author>
<author>
<name sortKey="Chaudhury, S" uniqKey="Chaudhury S">S Chaudhury</name>
</author>
<author>
<name sortKey="Steers, Nj" uniqKey="Steers N">NJ Steers</name>
</author>
<author>
<name sortKey="Sabato, M" uniqKey="Sabato M">M Sabato</name>
</author>
<author>
<name sortKey="Delvecchio, V" uniqKey="Delvecchio V">V Delvecchio</name>
</author>
<author>
<name sortKey="Wallqvist, As" uniqKey="Wallqvist A">AS Wallqvist</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bryson, Cj" uniqKey="Bryson C">CJ Bryson</name>
</author>
<author>
<name sortKey="Jones, Td" uniqKey="Jones T">TD Jones</name>
</author>
<author>
<name sortKey="Baker, Mp" uniqKey="Baker M">MP Baker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Khan, Jm" uniqKey="Khan J">JM Khan</name>
</author>
<author>
<name sortKey="Kumar, G" uniqKey="Kumar G">G Kumar</name>
</author>
<author>
<name sortKey="Ranganathan, S" uniqKey="Ranganathan S">S Ranganathan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moreau, V" uniqKey="Moreau V">V Moreau</name>
</author>
<author>
<name sortKey="Fleury, C" uniqKey="Fleury C">C Fleury</name>
</author>
<author>
<name sortKey="Piquer, D" uniqKey="Piquer D">D Piquer</name>
</author>
<author>
<name sortKey="Nguyen, C" uniqKey="Nguyen C">C Nguyen</name>
</author>
<author>
<name sortKey="Novali, N" uniqKey="Novali N">N Novali</name>
</author>
<author>
<name sortKey="Villard, S" uniqKey="Villard S">S Villard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Calis, Jj" uniqKey="Calis J">JJ Calis</name>
</author>
<author>
<name sortKey="Maybeno, M" uniqKey="Maybeno M">M Maybeno</name>
</author>
<author>
<name sortKey="Greenbaum, Ja" uniqKey="Greenbaum J">JA Greenbaum</name>
</author>
<author>
<name sortKey="Weiskopf, D" uniqKey="Weiskopf D">D Weiskopf</name>
</author>
<author>
<name sortKey="De Silva, Ad" uniqKey="De Silva A">AD De Silva</name>
</author>
<author>
<name sortKey="Sette, A" uniqKey="Sette A">A Sette</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paul, S" uniqKey="Paul S">S Paul</name>
</author>
<author>
<name sortKey="Kolla, Rv" uniqKey="Kolla R">RV Kolla</name>
</author>
<author>
<name sortKey="Sidney, J" uniqKey="Sidney J">J Sidney</name>
</author>
<author>
<name sortKey="Weiskopf, D" uniqKey="Weiskopf D">D Weiskopf</name>
</author>
<author>
<name sortKey="Fleri, W" uniqKey="Fleri W">W Fleri</name>
</author>
<author>
<name sortKey="Kim, Y" uniqKey="Kim Y">Y Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ponomarenko, Jv" uniqKey="Ponomarenko J">JV Ponomarenko</name>
</author>
<author>
<name sortKey="Van Regenmortel, Mh" uniqKey="Van Regenmortel M">MH Van Regenmortel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shmelkov, E" uniqKey="Shmelkov E">E Shmelkov</name>
</author>
<author>
<name sortKey="Krachmarov, C" uniqKey="Krachmarov C">C Krachmarov</name>
</author>
<author>
<name sortKey="Grigoryan, Av" uniqKey="Grigoryan A">AV Grigoryan</name>
</author>
<author>
<name sortKey="Pinter, A" uniqKey="Pinter A">A Pinter</name>
</author>
<author>
<name sortKey="Statnikov, A" uniqKey="Statnikov A">A Statnikov</name>
</author>
<author>
<name sortKey="Cardozo, T" uniqKey="Cardozo T">T Cardozo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tong, Jc" uniqKey="Tong J">JC Tong</name>
</author>
<author>
<name sortKey="Tan, Tw" uniqKey="Tan T">TW Tan</name>
</author>
<author>
<name sortKey="Ranganathan, S" uniqKey="Ranganathan S">S Ranganathan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tung, Cw" uniqKey="Tung C">CW Tung</name>
</author>
<author>
<name sortKey="Ho, Sy" uniqKey="Ho S">SY Ho</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="O Eill, E" uniqKey="O Eill E">E O’Neill</name>
</author>
<author>
<name sortKey="Kuo, Ls" uniqKey="Kuo L">LS Kuo</name>
</author>
<author>
<name sortKey="Krisko, Jf" uniqKey="Krisko J">JF Krisko</name>
</author>
<author>
<name sortKey="Tomchick, Dr" uniqKey="Tomchick D">DR Tomchick</name>
</author>
<author>
<name sortKey="Garcia, Jv" uniqKey="Garcia J">JV Garcia</name>
</author>
<author>
<name sortKey="Foster, Jl" uniqKey="Foster J">JL Foster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roy, A" uniqKey="Roy A">A Roy</name>
</author>
<author>
<name sortKey="Kucukural, A" uniqKey="Kucukural A">A Kucukural</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, J" uniqKey="Yang J">J Yang</name>
</author>
<author>
<name sortKey="Yan, R" uniqKey="Yan R">R Yan</name>
</author>
<author>
<name sortKey="Roy, A" uniqKey="Roy A">A Roy</name>
</author>
<author>
<name sortKey="Xu, D" uniqKey="Xu D">D Xu</name>
</author>
<author>
<name sortKey="Poisson, J" uniqKey="Poisson J">J Poisson</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Singh, P" uniqKey="Singh P">P Singh</name>
</author>
<author>
<name sortKey="Yadav, Gp" uniqKey="Yadav G">GP Yadav</name>
</author>
<author>
<name sortKey="Gupta, S" uniqKey="Gupta S">S Gupta</name>
</author>
<author>
<name sortKey="Tripathi, Ak" uniqKey="Tripathi A">AK Tripathi</name>
</author>
<author>
<name sortKey="Ramachandran, R" uniqKey="Ramachandran R">R Ramachandran</name>
</author>
<author>
<name sortKey="Tripathi, Rk" uniqKey="Tripathi R">RK Tripathi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Skolnick, J" uniqKey="Skolnick J">J Skolnick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kelley, La" uniqKey="Kelley L">LA Kelley</name>
</author>
<author>
<name sortKey="Mezulis, S" uniqKey="Mezulis S">S Mezulis</name>
</author>
<author>
<name sortKey="Yates, Cm" uniqKey="Yates C">CM Yates</name>
</author>
<author>
<name sortKey="Wass, Mn" uniqKey="Wass M">MN Wass</name>
</author>
<author>
<name sortKey="Sternberg, Mj" uniqKey="Sternberg M">MJ Sternberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kavanagh, Dg" uniqKey="Kavanagh D">DG Kavanagh</name>
</author>
<author>
<name sortKey="Kaufmann, De" uniqKey="Kaufmann D">DE Kaufmann</name>
</author>
<author>
<name sortKey="Sunderji, S" uniqKey="Sunderji S">S Sunderji</name>
</author>
<author>
<name sortKey="Frahm, N" uniqKey="Frahm N">N Frahm</name>
</author>
<author>
<name sortKey="Le Gall, S" uniqKey="Le Gall S">S Le Gall</name>
</author>
<author>
<name sortKey="Boczkowski, D" uniqKey="Boczkowski D">D Boczkowski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Baalen, Ca" uniqKey="Van Baalen C">CA van Baalen</name>
</author>
<author>
<name sortKey="Kwa, D" uniqKey="Kwa D">D Kwa</name>
</author>
<author>
<name sortKey="Verschuren, Ej" uniqKey="Verschuren E">EJ Verschuren</name>
</author>
<author>
<name sortKey="Reedijk, Ml" uniqKey="Reedijk M">ML Reedijk</name>
</author>
<author>
<name sortKey="Boon, Ac" uniqKey="Boon A">AC Boon</name>
</author>
<author>
<name sortKey="De Mutsert, G" uniqKey="De Mutsert G">G de Mutsert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Michael, Nl" uniqKey="Michael N">NL Michael</name>
</author>
<author>
<name sortKey="Chang, G" uniqKey="Chang G">G Chang</name>
</author>
<author>
<name sortKey="D Rcy, La" uniqKey="D Rcy L">LA d’Arcy</name>
</author>
<author>
<name sortKey="Tseng, Cj" uniqKey="Tseng C">CJ Tseng</name>
</author>
<author>
<name sortKey="Birx, Dl" uniqKey="Birx D">DL Birx</name>
</author>
<author>
<name sortKey="Sheppard, Hw" uniqKey="Sheppard H">HW Sheppard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, Y" uniqKey="Huang Y">Y Huang</name>
</author>
<author>
<name sortKey="Zhang, L" uniqKey="Zhang L">L Zhang</name>
</author>
<author>
<name sortKey="Ho, Dd" uniqKey="Ho D">DD Ho</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Doytchinova, Ia" uniqKey="Doytchinova I">IA Doytchinova</name>
</author>
<author>
<name sortKey="Flower, Dr" uniqKey="Flower D">DR Flower</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Calderon Gonzalez, R" uniqKey="Calderon Gonzalez R">R Calderon-Gonzalez</name>
</author>
<author>
<name sortKey="Tobes, R" uniqKey="Tobes R">R Tobes</name>
</author>
<author>
<name sortKey="Pareja, E" uniqKey="Pareja E">E Pareja</name>
</author>
<author>
<name sortKey="Frande Cabanes, E" uniqKey="Frande Cabanes E">E Frande-Cabanes</name>
</author>
<author>
<name sortKey="Petrovsky, N" uniqKey="Petrovsky N">N Petrovsky</name>
</author>
<author>
<name sortKey="Alvarez Dominguez, C" uniqKey="Alvarez Dominguez C">C Alvarez-Dominguez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Alvarez Dominguez, C" uniqKey="Alvarez Dominguez C">C Alvarez-Dominguez</name>
</author>
<author>
<name sortKey="Madrazo Toca, F" uniqKey="Madrazo Toca F">F Madrazo-Toca</name>
</author>
<author>
<name sortKey="Fernandez Prieto, L" uniqKey="Fernandez Prieto L">L Fernandez-Prieto</name>
</author>
<author>
<name sortKey="Vandekerckhove, J" uniqKey="Vandekerckhove J">J Vandekerckhove</name>
</author>
<author>
<name sortKey="Pareja, E" uniqKey="Pareja E">E Pareja</name>
</author>
<author>
<name sortKey="Tobes, R" uniqKey="Tobes R">R Tobes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Calder N Gonzalez, R" uniqKey="Calder N Gonzalez R">R Calderón-González</name>
</author>
<author>
<name sortKey="Frande Cabanes, E" uniqKey="Frande Cabanes E">E Frande-Cabanes</name>
</author>
<author>
<name sortKey="Bronchalo Vicente, L" uniqKey="Bronchalo Vicente L">L Bronchalo-Vicente</name>
</author>
<author>
<name sortKey="Lecea Cuello, Mj" uniqKey="Lecea Cuello M">MJ Lecea-Cuello</name>
</author>
<author>
<name sortKey="Pareja, E" uniqKey="Pareja E">E Pareja</name>
</author>
<author>
<name sortKey="Bosch Martinez, A" uniqKey="Bosch Martinez A">A Bosch-Martínez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Welsh, Rm" uniqKey="Welsh R">RM Welsh</name>
</author>
<author>
<name sortKey="Selin, Lk" uniqKey="Selin L">LK Selin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rehermann, B" uniqKey="Rehermann B">B Rehermann</name>
</author>
<author>
<name sortKey="Shin, Ec" uniqKey="Shin E">EC Shin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="De Boer, Rj" uniqKey="De Boer R">RJ de Boer</name>
</author>
<author>
<name sortKey="Perelson, As" uniqKey="Perelson A">AS Perelson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, S" uniqKey="Wang S">S Wang</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group>
<journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">30735507</article-id>
<article-id pub-id-type="pmc">6368308</article-id>
<article-id pub-id-type="publisher-id">PONE-D-18-24167</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0211714</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Physical Sciences</subject>
<subj-group>
<subject>Physics</subject>
<subj-group>
<subject>Particle Physics</subject>
<subj-group>
<subject>String Theory</subject>
<subj-group>
<subject>Superstring Theory</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Physical Sciences</subject>
<subj-group>
<subject>Physics</subject>
<subj-group>
<subject>Particle Physics</subject>
<subj-group>
<subject>Supersymmetry</subject>
<subj-group>
<subject>Superstring Theory</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Physiology</subject>
<subj-group>
<subject>Immune Physiology</subject>
<subj-group>
<subject>Antigens</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Medicine and Health Sciences</subject>
<subj-group>
<subject>Physiology</subject>
<subj-group>
<subject>Immune Physiology</subject>
<subj-group>
<subject>Antigens</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Immunology</subject>
<subj-group>
<subject>Immune System Proteins</subject>
<subj-group>
<subject>Antigens</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Medicine and Health Sciences</subject>
<subj-group>
<subject>Immunology</subject>
<subj-group>
<subject>Immune System Proteins</subject>
<subj-group>
<subject>Antigens</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Biochemistry</subject>
<subj-group>
<subject>Proteins</subject>
<subj-group>
<subject>Immune System Proteins</subject>
<subj-group>
<subject>Antigens</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Medicine and Health Sciences</subject>
<subj-group>
<subject>Infectious Diseases</subject>
<subj-group>
<subject>Infectious Disease Control</subject>
<subj-group>
<subject>Vaccines</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Research and Analysis Methods</subject>
<subj-group>
<subject>Database and Informatics Methods</subject>
<subj-group>
<subject>Bioinformatics</subject>
<subj-group>
<subject>Sequence Analysis</subject>
<subj-group>
<subject>Sequence Alignment</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Physical Sciences</subject>
<subj-group>
<subject>Mathematics</subject>
<subj-group>
<subject>Applied Mathematics</subject>
<subj-group>
<subject>Algorithms</subject>
<subj-group>
<subject>Genetic Algorithms</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Research and Analysis Methods</subject>
<subj-group>
<subject>Simulation and Modeling</subject>
<subj-group>
<subject>Algorithms</subject>
<subj-group>
<subject>Genetic Algorithms</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Physical Sciences</subject>
<subj-group>
<subject>Mathematics</subject>
<subj-group>
<subject>Applied Mathematics</subject>
<subj-group>
<subject>Algorithms</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Research and Analysis Methods</subject>
<subj-group>
<subject>Simulation and Modeling</subject>
<subj-group>
<subject>Algorithms</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Research and analysis methods</subject>
<subj-group>
<subject>Database and informatics methods</subject>
<subj-group>
<subject>Bioinformatics</subject>
<subj-group>
<subject>Sequence analysis</subject>
<subj-group>
<subject>BLAST algorithm</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Molecular Biology</subject>
<subj-group>
<subject>Macromolecular Structure Analysis</subject>
<subj-group>
<subject>Protein Structure</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Biochemistry</subject>
<subj-group>
<subject>Proteins</subject>
<subj-group>
<subject>Protein Structure</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Weighted lambda superstrings applied to vaccine design</article-title>
<alt-title alt-title-type="running-head">Weighted lambda superstrings applied to vaccine design</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0002-6297-1449</contrib-id>
<name>
<surname>Martínez</surname>
<given-names>Luis</given-names>
</name>
<role content-type="http://credit.casrai.org/">Conceptualization</role>
<role content-type="http://credit.casrai.org/">Investigation</role>
<role content-type="http://credit.casrai.org/">Methodology</role>
<role content-type="http://credit.casrai.org/">Software</role>
<role content-type="http://credit.casrai.org/">Supervision</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id authenticated="true" contrib-id-type="orcid">http://orcid.org/0000-0002-8222-8097</contrib-id>
<name>
<surname>Milanič</surname>
<given-names>Martin</given-names>
</name>
<role content-type="http://credit.casrai.org/">Conceptualization</role>
<role content-type="http://credit.casrai.org/">Investigation</role>
<role content-type="http://credit.casrai.org/">Software</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Malaina</surname>
<given-names>Iker</given-names>
</name>
<role content-type="http://credit.casrai.org/">Data curation</role>
<role content-type="http://credit.casrai.org/">Investigation</role>
<role content-type="http://credit.casrai.org/">Software</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Álvarez</surname>
<given-names>Carmen</given-names>
</name>
<role content-type="http://credit.casrai.org/">Conceptualization</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<xref ref-type="aff" rid="aff005">
<sup>5</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pérez</surname>
<given-names>Martín-Blas</given-names>
</name>
<role content-type="http://credit.casrai.org/">Software</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>M. de la Fuente</surname>
<given-names>Ildefonso</given-names>
</name>
<role content-type="http://credit.casrai.org/">Conceptualization</role>
<role content-type="http://credit.casrai.org/">Writing – original draft</role>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff006">
<sup>6</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff001">
<label>1</label>
<addr-line>Department of Mathematics, University of the Basque Country UPV/EHU, Bilbao, Spain</addr-line>
</aff>
<aff id="aff002">
<label>2</label>
<addr-line>Biocruces Bizkaia Health Research Institute, Barakaldo, Spain</addr-line>
</aff>
<aff id="aff003">
<label>3</label>
<addr-line>Basque Center for Applied Mathematics BCAM, Bilbao, Spain</addr-line>
</aff>
<aff id="aff004">
<label>4</label>
<addr-line>University of Primorska, UP IAM and UP FAMNIT, Koper, Slovenia</addr-line>
</aff>
<aff id="aff005">
<label>5</label>
<addr-line>IDIVAL Valdecilla Biomedical Research Institute, Santander, Spain</addr-line>
</aff>
<aff id="aff006">
<label>6</label>
<addr-line>Department of Nutrition, CEBAS-CSIC Institute, Murcia, Spain</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Loverdo</surname>
<given-names>Claude</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>UPMC, FRANCE</addr-line>
</aff>
<author-notes>
<fn fn-type="COI-statement" id="coi001">
<p>
<bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<corresp id="cor001">* E-mail:
<email>luis.martinez@ehu.eus</email>
</corresp>
</author-notes>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<pub-date pub-type="epub">
<day>8</day>
<month>2</month>
<year>2019</year>
</pub-date>
<volume>14</volume>
<issue>2</issue>
<elocation-id>e0211714</elocation-id>
<history>
<date date-type="received">
<day>16</day>
<month>8</month>
<year>2018</year>
</date>
<date date-type="accepted">
<day>19</day>
<month>1</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>© 2019 Martínez et al</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Martínez et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="pone.0211714.pdf"></self-uri>
<abstract>
<p>We generalize the notion of λ-superstrings, presented in a previous paper, to the notion of weighted λ-superstrings. This generalization entails an important improvement in the applications to vaccine designs, as it allows epitopes to be weighted by their immunogenicities. Motivated by these potential applications of constructing short weighted λ-superstrings to vaccine design, we approach this problem in two ways. First, we formalize the problem as a combinatorial optimization problem (in fact, as two polynomially equivalent problems) and develop an integer programming (IP) formulation for solving it optimally. Second, we describe a model that also takes into account good pairwise alignments of the obtained superstring with the input strings, and present a genetic algorithm that solves the problem approximately. We apply both algorithms to a set of 169 strings corresponding to the Nef protein taken from patiens infected with HIV-1. In the IP-based algorithm, we take the epitopes and the estimation of the immunogenicities from databases of experimental epitopes. In the genetic algorithm we take as candidate epitopes all 9-mers present in the 169 strings and estimate their immunogenicities using a public bioinformatics tool. Finally, we used several bioinformatic tools to evaluate the properties of the candidates generated by our method, which indicated that we can score high immunogenic λ-superstrings that at the same time present similar conformations to the Nef virus proteins.</p>
</abstract>
<funding-group>
<award-group id="award001">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/501100003086</institution-id>
<institution>Eusko Jaurlaritza</institution>
</institution-wrap>
</funding-source>
<award-id>IT753-13</award-id>
</award-group>
<award-group id="award002">
<funding-source>
<institution>Eusko Jaurlaritza (ES)</institution>
</funding-source>
<award-id>IT974-16</award-id>
</award-group>
<funding-statement>This research was supported in part by the Basque Government, grants IT753-13 and IT974-16 and by the UPV/EHU and Basque Center of Applied Mathematics, grant US18/21. This research was also in part by the Slovenian Research Agency (I0-0035, research program P1-0285, and research projects N1-0032, J1-7051, and J1-9110). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts>
<fig-count count="6"></fig-count>
<table-count count="5"></table-count>
<page-count count="27"></page-count>
</counts>
<custom-meta-group>
<custom-meta id="data-availability">
<meta-name>Data Availability</meta-name>
<meta-value>All relevant data are within the manuscript and its Supporting Information files. The code for the genetic algorithm can be found in:
<ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/1487837">https://zenodo.org/record/1487837</ext-link>
.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes>
<title>Data Availability</title>
<p>All relevant data are within the manuscript and its Supporting Information files. The code for the genetic algorithm can be found in:
<ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/1487837">https://zenodo.org/record/1487837</ext-link>
.</p>
</notes>
</front>
<body>
<sec sec-type="intro" id="sec001">
<title>Introduction</title>
<p>Infectious and transmissible diseases cause deaths of millions of people every year. The best immunological measures to prevent such diseases are vaccines. Therefore, the main efforts of immunologists are focused towards improving our predictions of effective epitopes that would confer protection against pathogens [
<xref rid="pone.0211714.ref001" ref-type="bibr">1</xref>
] and towards enhancing our ability to select appropriate epitopes for inclusion in an efficient vaccine [
<xref rid="pone.0211714.ref002" ref-type="bibr">2</xref>
]. Protective immunity requires humoral or cellular immunity depending on the pathogen.</p>
<p>Humoral immunity implies the production of antibodies by B cells that interact with surface or secreted toxins of pathogens. Each antibody binds to an epitope, defined as the three-dimensional structure of amino acids that can be contacted by the variable region of an antibody. There are two types of B-cell epitopes: (i) linear or continuous epitopes, which are short peptides that correspond to a fragment of a protein, and (ii) conformational epitopes, composed of amino acids not contiguous in primary sequence of the protein but brought in close proximity within the folded 3D structure. The length of these epitopes is variable, ranging from 8 to 20 amino acids [
<xref rid="pone.0211714.ref003" ref-type="bibr">3</xref>
].</p>
<p>Cellular immunity depends on T-cell epitopes generated in other cell types, the antigen presenting cells (or APC) that generate linear epitopes from pathogen degradation or protein synthesis. These short linear amino acids generated from intracellular degraded or synthesized proteins from the microorganisms bind to two types of major histocompatibility complexes (MHC), class I MHC that attach epitopes of 8-9-mer lengths and class II MHC that fit epitopes of 12-15-mer lengths [
<xref rid="pone.0211714.ref004" ref-type="bibr">4</xref>
]. CD4+ T cells recognize class II MHC epitopes and CD8+ T cells recognize class I MHC epitopes in APC.</p>
<p>Bionformatics methods that predict B-cell epitopes are based on certain correlations between some physicochemical properties of amino acids and the locations of linear B-cell epitopes with protein sequences [
<xref rid="pone.0211714.ref005" ref-type="bibr">5</xref>
]. Therefore, hydrophilicity, flexibility, turns, and solvent accessibility generated propensity scales for B-cell epitope prediction. However, propensity scale predictions have failed to predict B-cell epitopes since they are mainly based on fixed lengths and require flexibility [
<xref rid="pone.0211714.ref006" ref-type="bibr">6</xref>
].</p>
<p>Mapping of T-cell epitopes has been based on using complete sets of overlapping peptides or biochemical elution methods from MHC molecules. Both methods, when applied to a classical T cell-mediated pathogen as
<italic>Listeria monocytogenes</italic>
were costly, time consuming and, more importantly, failed to generate predictive rules [
<xref rid="pone.0211714.ref007" ref-type="bibr">7</xref>
], [
<xref rid="pone.0211714.ref008" ref-type="bibr">8</xref>
]. More recently, bioinformatics methods have also been applied to T-cell epitopes via their ability to bind MHC molecules [
<xref rid="pone.0211714.ref009" ref-type="bibr">9</xref>
]. However, they have not been able to predict efficient epitopes for vaccine design. Therefore, a mathematical method of epitope prediction able to be applied either to B or T-cell epitopes is important in the immunology field of vaccination. This has been highlighted in the last outbreaks of world wide infectious diseases, such as flu every year or Ebola in the most recent years.</p>
<p>Martínez et al. [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
] introduced the notion of a λ-
<italic>superstring</italic>
along with an optimization problem associated to it, and gave an application to the computational design of vaccines. Given two sets of strings, a set of
<italic>host strings</italic>
, which models a set of instances of a protein (which in our case will be amino acid sequences of the protein for a given pathogen), and a set of
<italic>target strings</italic>
, which models a set of epitopes, a λ-superstring was defined to be a string that models a candidate vaccine containing, as substrings, at least λ target strings from each host string. This means that the vaccine covers at least λ epitopes in each patient. The associated optimization problem was to find a λ-superstring of minimum length, which means to find a candidate vaccine as short as possible. The aforementioned problem in [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
] was shown to generalize both the shortest common superstring problem and the set cover problem, and in order to solve it they gave two approaches, one to find exact solutions and the other one to obtain approximate solutions. The approach giving exact solutions was based on an integer programming formulation of the problem, under the assumption that no two target strings are comparable with respect to the substring relation.</p>
<p>Motivated by the necessity of selecting the most effective epitopes mentioned at the beginning of this section, we give in this paper a generalization of the notion of λ-superstring and of the corresponding optimization problem, which is more biologically meaningful. We consider a weight function for the target strings, which represents the immunogenicity of each epitope. A
<italic>weighted</italic>
λ-
<italic>superstring</italic>
is then defined as a string such that for every host string, the sum of the weights of all target strings covered simultaneously by the string and the host string is at least λ (i.e., the minimum of the sums of predicted immunogenicities of the epitopes in each protein variant considered is at least λ). Note that, in principle, the model allows for negative weights. On one hand, the more negative the immunogenicity of an epitope is, the less we prefer the corresponding target string to be a substring of a weighted λ-superstring. However, it could happen that a short weighted λ-superstring necessarily contains target strings representing epitopes of large positive immunogenicity (indicating that its epitopes will likely induce an immune response), which together cover a target string representing an epitope of negative immunogenicity (meaning that it is very unlikely for those covered epitopes to generate an immune response). Furthermore, in the Materials and Methods section we will present a model that also takes into account good pairwise alignments of the obtained superstring with the host strings, in which case target strings with negative weights could be essential. Therefore, we cannot simply disregard target strings with negative weights from the model.</p>
<p>We give two methods for obtaining short weighted λ-superstrings in the Materials and Methods Section. In the first subsection, a mathematical formulation of the problem is presented. In the second subsection, following the approach of [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
], a graph theoretic formulation of the problem is given, from which an integer program is derived leading to optimal solutions to the problem of finding shortest weighted λ-superstrings. Next, in the third subsection, a genetic algorithm is introduced to obtain suboptimal solutions in the case when the integer programming approach cannot be used due to the large number of variables in the IP formulation. This algorithm, besides getting the λ-superstring criterion closer to biological reality, considers an additional objective to be optimized simultaneously, the alignment of the protein. By optimizing the alignment, we can obtain vaccine candidates that resemble the virus proteins that are recognized by the immune system, and therefore, build a pseudo-protein that will have a stable structure, recognizable by the MHC-complex. Our genetic algorithm is based on the NSGA-II algorithm [
<xref rid="pone.0211714.ref011" ref-type="bibr">11</xref>
], which is one of the most used heuristic techniques for solving multi-objective problems, which stands out due to its high speed, elitism, and non-necessity of specifying a sharing parameter for the optimization. In the Results section we give an application to the design of a weighted λ-superstring for a set of target strings corresponding to the Nef protein of HIV-1. We chose Nef because it is highly immunogenic [
<xref rid="pone.0211714.ref012" ref-type="bibr">12</xref>
] and plays an important role in HIV pathogenesis [
<xref rid="pone.0211714.ref013" ref-type="bibr">13</xref>
]. In order to evaluate the goodness of our candidate in silico, we have used several bioinformatic tools such as Blast, VaxiJen, I-Tasser and Phyre-2. In addition, we have studied the mismatch proportion, and compared our candidate to a candidate obtained by LANL’s Epigraph, a consensus sequence and to one of the solutions using the unweighted algorithm from [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
]. Finally, in the Discussion section, the main conclusions are presented and some future lines of research are outlined.</p>
</sec>
<sec sec-type="materials|methods" id="sec002">
<title>Materials and methods</title>
<sec id="sec003">
<title>The shortest weighted λ-superstring problem</title>
<p>In this subsection, we give a mathematical formulation of the problem. We first recall some notation and terminology for finite strings (that is, finite sequences) over a finite alphabet
<italic>A</italic>
. We denote by
<italic>ϵ</italic>
the empty string, and by
<italic>A</italic>
* the set
<inline-formula id="pone.0211714.e001">
<alternatives>
<graphic xlink:href="pone.0211714.e001.jpg" id="pone.0211714.e001g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M1">
<mml:mrow>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mo>*</mml:mo>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi></mml:mi>
</mml:msubsup>
<mml:msup>
<mml:mi>A</mml:mi>
<mml:mi>n</mml:mi>
</mml:msup>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mi>ϵ</mml:mi>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
of all finite strings over
<italic>A</italic>
. It is well known (and can be easily seen) that the set
<italic>A</italic>
* forms a semigroup with respect to the operation + of concatenation (
<italic>s</italic>
<sub>1</sub>
, …,
<italic>s</italic>
<sub>
<italic>n</italic>
</sub>
) + (
<italic>t</italic>
<sub>1</sub>
, …,
<italic>t</italic>
<sub>
<italic>m</italic>
</sub>
) = (
<italic>s</italic>
<sub>1</sub>
, …,
<italic>s</italic>
<sub>
<italic>n</italic>
</sub>
,
<italic>t</italic>
<sub>1</sub>
, …,
<italic>t</italic>
<sub>
<italic>m</italic>
</sub>
). Given a string
<bold>s</bold>
= (
<italic>s</italic>
<sub>1</sub>
, …,
<italic>s
<sub>n</sub>
</italic>
) ∈
<italic>A</italic>
*, we denote by
<italic></italic>
(
<bold>s</bold>
) the
<italic>length</italic>
of
<bold>s</bold>
, that is,
<italic>n</italic>
. We say that a string
<bold>s</bold>
is a
<italic>substring</italic>
of another string
<bold>t</bold>
, and denote this relation by
<bold>s</bold>
<bold>t</bold>
, if
<bold>t</bold>
can be written as
<bold>t</bold>
=
<bold>u</bold>
+
<bold>s</bold>
+
<bold>v</bold>
for some strings
<bold>u</bold>
and
<bold>v</bold>
over
<italic>A</italic>
. We also use ⊂ to denote the proper substring relation, that is,
<bold>s</bold>
<bold>t</bold>
if and only if
<bold>s</bold>
<bold>t</bold>
and
<bold>s</bold>
<bold>t</bold>
. Given two strings
<bold>s</bold>
= (
<italic>s</italic>
<sub>1</sub>
, …,
<italic>s
<sub>n</sub>
</italic>
),
<bold>t</bold>
= (
<italic>t</italic>
<sub>1</sub>
, …,
<italic>t
<sub>m</sub>
</italic>
) in
<italic>A</italic>
*, the
<italic>degree of overlapping</italic>
of
<bold>s</bold>
and
<bold>t</bold>
is defined as
<disp-formula id="pone.0211714.e002">
<alternatives>
<graphic xlink:href="pone.0211714.e002.jpg" id="pone.0211714.e002g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M2">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>o</mml:mi>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo form="prefix" movablelimits="true">max</mml:mo>
<mml:mo>{</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:mo form="prefix" movablelimits="true">min</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo>}</mml:mo>
</mml:mrow>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>for</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>}</mml:mo>
<mml:mspace width="0.166667em"></mml:mspace>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
</p>
<p>The operation of the
<italic>overlapping sum</italic>
+′ in
<italic>A</italic>
* is defined by
<disp-formula id="pone.0211714.e003">
<alternatives>
<graphic xlink:href="pone.0211714.e003.jpg" id="pone.0211714.e003g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M3">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mo>+</mml:mo>
<mml:mo></mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>o</mml:mi>
<mml:mi>v</mml:mi>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">t</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>m</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
</p>
<p>We remark that this operation is not associative.</p>
<p>The combinatorial approach to the design of vaccines described in [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
] is based on the notions of λ-superstrings and λ-cover superstrings, which we now recall. Given two finite sets
<italic>H</italic>
,
<italic>T</italic>
<italic>A</italic>
* of
<italic>host</italic>
and
<italic>target</italic>
strings (modeling the set of instances of the chosen pathogen protein and the set of epitopes), respectively, and a positive integer λ, a λ-
<italic>superstring</italic>
for (
<italic>H</italic>
,
<italic>T</italic>
) is a string
<bold>v</bold>
<italic>A</italic>
* such that for every host string
<bold>h</bold>
<italic>H</italic>
, there exist at least λ strings in
<italic>T</italic>
that are common substrings of both
<bold>h</bold>
and
<bold>v</bold>
. Similarly, given a collection
<inline-formula id="pone.0211714.e004">
<alternatives>
<graphic xlink:href="pone.0211714.e004.jpg" id="pone.0211714.e004g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M4">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</alternatives>
</inline-formula>
of finitely many finite sets of strings over
<italic>A</italic>
(that is,
<inline-formula id="pone.0211714.e005">
<alternatives>
<graphic xlink:href="pone.0211714.e005.jpg" id="pone.0211714.e005g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M5">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>{</mml:mo>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
where
<italic>X</italic>
<sub>
<italic>i</italic>
</sub>
<italic>A</italic>
* for all
<italic>i</italic>
∈ {1, …,
<italic>n</italic>
}) and a positive integer λ, a λ-
<italic>cover superstring</italic>
for
<inline-formula id="pone.0211714.e006">
<alternatives>
<graphic xlink:href="pone.0211714.e006.jpg" id="pone.0211714.e006g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M6">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</alternatives>
</inline-formula>
is a string
<bold>v</bold>
<italic>A</italic>
* such that for every
<inline-formula id="pone.0211714.e007">
<alternatives>
<graphic xlink:href="pone.0211714.e007.jpg" id="pone.0211714.e007g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M7">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
, at least λ strings in
<italic>X</italic>
are substrings of
<bold>v</bold>
.</p>
<p>We now generalize these notions and the corresponding optimization problems to the weighted case.</p>
<p>
<bold>Definition 1</bold>
<italic>Let H, T</italic>
<italic>A</italic>
*
<italic>be two finite sets of host and target strings, respectively, let each target string</italic>
<bold>t</bold>
<italic>T be equipped with a weight</italic>
<inline-formula id="pone.0211714.e008">
<alternatives>
<graphic xlink:href="pone.0211714.e008.jpg" id="pone.0211714.e008g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M8">
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">t</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo></mml:mo>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
,
<italic>and let</italic>
<inline-formula id="pone.0211714.e009">
<alternatives>
<graphic xlink:href="pone.0211714.e009.jpg" id="pone.0211714.e009g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M9">
<mml:mrow>
<mml:mo>λ</mml:mo>
<mml:mo></mml:mo>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
.
<italic>A weighted</italic>
λ-
<italic>superstring for</italic>
(
<italic>H</italic>
,
<italic>T</italic>
,
<italic>w</italic>
)
<italic>is a string</italic>
<bold>v</bold>
<italic>A</italic>
*
<italic>such that for every</italic>
<bold>h</bold>
<italic>H, the sum of the weights of the target strings that are common substrings of both</italic>
<bold>h</bold>
<italic>and</italic>
<bold>v</bold>
<italic>is at least</italic>
λ.</p>
<p>More formally, denoting by
<italic>CS</italic>
(
<bold>s</bold>
,
<bold>t</bold>
) the set of all common substrings of two strings
<bold>s</bold>
and
<bold>t</bold>
, a weighted λ-superstring for (
<italic>H</italic>
,
<italic>T</italic>
,
<italic>w</italic>
) is a string
<bold>v</bold>
<italic>A</italic>
* such that
<disp-formula id="pone.0211714.e010">
<alternatives>
<graphic xlink:href="pone.0211714.e010.jpg" id="pone.0211714.e010g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M10">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">t</mml:mi>
<mml:mo></mml:mo>
<mml:mi>C</mml:mi>
<mml:mi>S</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">h</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="bold">v</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo></mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo></mml:mo>
<mml:mo>λ</mml:mo>
<mml:mtext>for</mml:mtext>
<mml:mspace width="2pt"></mml:mspace>
<mml:mtext>all</mml:mtext>
<mml:mspace width="2pt"></mml:mspace>
<mml:mtext mathvariant="bold">h</mml:mtext>
<mml:mo></mml:mo>
<mml:mi>H</mml:mi>
<mml:mspace width="0.166667em"></mml:mspace>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
</p>
<p>Clearly, if
<italic>w</italic>
(
<bold>t</bold>
) = 1 for all
<bold>t</bold>
<italic>T</italic>
, then a string
<bold>v</bold>
is a weighted λ-superstring for (
<italic>H</italic>
,
<italic>T</italic>
,
<italic>w</italic>
) if and only if
<bold>v</bold>
is a λ-superstring for (
<italic>H</italic>
,
<italic>T</italic>
).</p>
<p>The corresponding optimization problem (
<xref ref-type="boxed-text" rid="pone.0211714.box001">Box 1</xref>
) is the following:</p>
<boxed-text id="pone.0211714.box001" position="float" orientation="portrait">
<sec id="sec004">
<title>Box 1</title>
<p>S
<sc>hortest</sc>
W
<sc>eighted</sc>
λ-S
<sc>uperstring</sc>
</p>
<p>
<bold>Instance</bold>
: A finite set of
<italic>H</italic>
<italic>A</italic>
* of
<italic>host</italic>
strings, a finite set of
<italic>T</italic>
<italic>A</italic>
* of
<italic>target</italic>
strings, a weight function
<inline-formula id="pone.0211714.e011">
<alternatives>
<graphic xlink:href="pone.0211714.e011.jpg" id="pone.0211714.e011g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M11">
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
, a
<italic>covering requirement</italic>
λ ∈
<italic>R</italic>
.</p>
<p>
<bold>Task</bold>
: Find a weighted λ-superstring for (
<italic>H</italic>
,
<italic>T</italic>
,
<italic>w</italic>
) of minimum length.</p>
</sec>
</boxed-text>
<p>The restriction of the S
<sc>hortest</sc>
W
<sc>eighted</sc>
λ-S
<sc>uperstring</sc>
problem to instances such that
<italic>w</italic>
(
<bold>t</bold>
) = 1 for all
<bold>t</bold>
<italic>T</italic>
is equivalent to the S
<sc>hortest</sc>
λ-S
<sc>uperstring</sc>
problem defined in [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
].</p>
<p>
<bold>Definition 2</bold>
<italic>Let</italic>
<inline-formula id="pone.0211714.e012">
<alternatives>
<graphic xlink:href="pone.0211714.e012.jpg" id="pone.0211714.e012g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M12">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</alternatives>
</inline-formula>
<italic>be a collection of finitely many finite sets of strings over A, let</italic>
<inline-formula id="pone.0211714.e013">
<alternatives>
<graphic xlink:href="pone.0211714.e013.jpg" id="pone.0211714.e013g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M13">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
,
<italic>let</italic>
<inline-formula id="pone.0211714.e014">
<alternatives>
<graphic xlink:href="pone.0211714.e014.jpg" id="pone.0211714.e014g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M14">
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
,
<italic>and let</italic>
<inline-formula id="pone.0211714.e015">
<alternatives>
<graphic xlink:href="pone.0211714.e015.jpg" id="pone.0211714.e015g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M15">
<mml:mrow>
<mml:mo>λ</mml:mo>
<mml:mo></mml:mo>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
.
<italic>A</italic>
weighted λ-cover superstring
<italic>for</italic>
<inline-formula id="pone.0211714.e016">
<alternatives>
<graphic xlink:href="pone.0211714.e016.jpg" id="pone.0211714.e016g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M16">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
<italic>is a string</italic>
<bold>v</bold>
<italic>A</italic>
*
<italic>such that for every</italic>
<inline-formula id="pone.0211714.e017">
<alternatives>
<graphic xlink:href="pone.0211714.e017.jpg" id="pone.0211714.e017g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M17">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
,
<italic>the sum of the weights w</italic>
(
<bold>t</bold>
)
<italic>of the strings</italic>
<bold>t</bold>
<italic>X that are substrings of</italic>
<bold>v</bold>
<italic>is at least</italic>
λ.
<italic>Formally, for every</italic>
<inline-formula id="pone.0211714.e018">
<alternatives>
<graphic xlink:href="pone.0211714.e018.jpg" id="pone.0211714.e018g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M18">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
,
<italic>we have</italic>
<sub>
<bold>t</bold>
<italic>X</italic>
,
<bold>t</bold>
<bold>v</bold>
</sub>
<italic>w</italic>
(
<bold>t</bold>
) ≥ λ.</p>
<p>Clearly, the case of unit weights corresponds to the notion of a λ-cover superstring. The corresponding optimization problem (
<xref ref-type="boxed-text" rid="pone.0211714.box002">Box 2</xref>
) is the following:</p>
<boxed-text id="pone.0211714.box002" position="float" orientation="portrait">
<sec id="sec005">
<title>Box 2</title>
<p>S
<sc>hortest</sc>
W
<sc>eighted</sc>
λ-C
<sc>over</sc>
S
<sc>uperstring</sc>
</p>
<p>
<bold>Instance</bold>
: A collection
<inline-formula id="pone.0211714.e019">
<alternatives>
<graphic xlink:href="pone.0211714.e019.jpg" id="pone.0211714.e019g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M19">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</alternatives>
</inline-formula>
of finitely many finite sets of finite strings over alphabet
<italic>A</italic>
, a weight function
<inline-formula id="pone.0211714.e020">
<alternatives>
<graphic xlink:href="pone.0211714.e020.jpg" id="pone.0211714.e020g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M20">
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
, a
<italic>covering requirement</italic>
<inline-formula id="pone.0211714.e021">
<alternatives>
<graphic xlink:href="pone.0211714.e021.jpg" id="pone.0211714.e021g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M21">
<mml:mrow>
<mml:mo>λ</mml:mo>
<mml:mo></mml:mo>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
.</p>
<p>
<bold>Task</bold>
: Find a weighted λ-cover superstring for
<inline-formula id="pone.0211714.e022">
<alternatives>
<graphic xlink:href="pone.0211714.e022.jpg" id="pone.0211714.e022g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M22">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
of minimum length.</p>
</sec>
</boxed-text>
<p>The restriction of the S
<sc>hortest</sc>
W
<sc>eighted</sc>
λ-C
<sc>over</sc>
S
<sc>uperstring</sc>
problem to instances such that
<italic>w</italic>
(
<bold>t</bold>
) = 1 for all
<inline-formula id="pone.0211714.e023">
<alternatives>
<graphic xlink:href="pone.0211714.e023.jpg" id="pone.0211714.e023g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M23">
<mml:mrow>
<mml:mi mathvariant="bold">t</mml:mi>
<mml:mo></mml:mo>
<mml:msub>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
is equivalent to the S
<sc>hortest</sc>
λ-C
<sc>over</sc>
S
<sc>uperstring</sc>
problem defined in [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
]. In that paper, it was proved that the S
<sc>hortest</sc>
λ-S
<sc>uperstring</sc>
problem is polynomially equivalent to the S
<sc>hortest</sc>
λ-C
<sc>over</sc>
S
<sc>uperstring</sc>
problem. This equivalence extends straightforwardly to the weighted versions of the problems. Moreover, since the weighted versions of the problem generalize the unweighted ones, hardness results from [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
] immediately carry over to the weighted ones. In particular:</p>
<p>
<bold>Theorem 3</bold>
<italic>1. For every ϵ</italic>
> 0,
<italic>there is no polynomial time algorithm approximating the</italic>
S
<sc>hortest</sc>
W
<sc>eighted</sc>
λ-S
<sc>uperstring</sc>
<italic>problem within a factor of</italic>
(1 −
<italic>ϵ</italic>
)ln |
<italic>H</italic>
|,
<italic>unless</italic>
<monospace>P</monospace>
=
<monospace>NP</monospace>
,
<italic>even for the case of the binary alphabet A</italic>
= {0, 1},
<italic>a constant weight function w</italic>
≡ 1,
<italic>and</italic>
λ = 1.</p>
<p>
<italic>2. For every ϵ</italic>
> 0,
<italic>there is no polynomial time algorithm approximating the</italic>
S
<sc>hortest</sc>
W
<sc>eighted</sc>
λ-C
<sc>over</sc>
S
<sc>perstring</sc>
<italic>problem within a factor of</italic>
<inline-formula id="pone.0211714.e024">
<alternatives>
<graphic xlink:href="pone.0211714.e024.jpg" id="pone.0211714.e024g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M24">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:mi>ϵ</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo form="prefix">ln</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
<italic>unless</italic>
<monospace>P</monospace>
=
<monospace>NP</monospace>
,
<italic>even for the case of the binary alphabet, a constant weight function w</italic>
≡ 1,
<italic>and</italic>
λ = 1.</p>
<p>The corresponding hardness results from [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
] are stated with a multiplicative constant of
<italic>c</italic>
> 0.2267 instead of 1 −
<italic>ϵ</italic>
. However, exactly the same approach as the one used to prove Theorem 3.9 and Corollary 3.10 in [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
] can be used to derive Theorem 3; one only needs to use the more recent, stronger inapproximability result on the set cover problem due to Dinur and Steurer [
<xref rid="pone.0211714.ref014" ref-type="bibr">14</xref>
] instead of the one due to Alon et al. [
<xref rid="pone.0211714.ref015" ref-type="bibr">15</xref>
].</p>
<p>Theorem 3 suggests that most likely the two problems cannot be solved optimally or approximately by efficient algorithms, and motivate the development of exact exponential time algorithms and of suboptimal heuristic approaches. This is what we do in the next two subsections.</p>
</sec>
<sec id="sec006">
<title>Graph theoretic and integer programming formulations of the shortest weighted λ-cover superstring problem</title>
<p>In this section, we extend the graph theoretic and integer programming (IP) formulations of the S
<sc>hortest</sc>
λ-C
<sc>over</sc>
S
<sc>uperstring</sc>
problem from [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
] to the weighted case. (For background on integer programming, see, e.g., [
<xref rid="pone.0211714.ref016" ref-type="bibr">16</xref>
]). Following [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
], we model the problem as a generalization of the
<italic>generalized Traveling Salesman Problem</italic>
. In this problem, the set of vertices of a given complete directed edge-weighted graph is divided into clusters and the objective is to find a minimum-cost tour passing through at least one node from each cluster.</p>
<p>The graph theoretic model for the S
<sc>hortest</sc>
λ-C
<sc>over</sc>
S
<sc>uperstring</sc>
problem from [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
] is based on a derived complete edge-weighted directed graph
<italic>G</italic>
with vertex set
<inline-formula id="pone.0211714.e025">
<alternatives>
<graphic xlink:href="pone.0211714.e025.jpg" id="pone.0211714.e025g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M25">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
plus one special vertex. Roughly speaking, the main idea is the following. Given a λ-cover superstring
<bold>v</bold>
for
<inline-formula id="pone.0211714.e026">
<alternatives>
<graphic xlink:href="pone.0211714.e026.jpg" id="pone.0211714.e026g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M26">
<mml:mi mathvariant="script">C</mml:mi>
</mml:math>
</alternatives>
</inline-formula>
, one can identify a set of substrings of
<bold>v</bold>
that are pairwise incomparable with respect to the substring relation and contain, as substrings, at least λ strings from each cluster
<inline-formula id="pone.0211714.e027">
<alternatives>
<graphic xlink:href="pone.0211714.e027.jpg" id="pone.0211714.e027g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M27">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
. Sorting these strings in order of their first appearance in
<bold>v</bold>
yields a directed path in
<italic>G</italic>
that can be extended to a directed cycle in
<italic>G</italic>
through the special vertex. By construction, the vertices of this cycle “cover” (in the sense of substring relation, when viewed as strings) at least λ vertices from each cluster
<inline-formula id="pone.0211714.e028">
<alternatives>
<graphic xlink:href="pone.0211714.e028.jpg" id="pone.0211714.e028g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M28">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
. The weights of the edges are defined so that the length of the resulting cycle does not exceed the length of
<bold>v</bold>
. And conversely, every directed cycle in
<italic>G</italic>
through the special vertex satisfying the above covering property and such that no two strings corresponding to (non-special) vertices of the cycle are comparable with respect to the substring relation can be transformed into a λ-cover superstring
<bold>v</bold>
, by taking the overlapping sum of the strings corresponding to the non-special vertices of the cycle. The weights of the edges are defined so that the length of the cycle equals the length of the obtained superstring.</p>
<p>We now formalize these notions and explain the extension to the weighted case. Consider an instance
<inline-formula id="pone.0211714.e029">
<alternatives>
<graphic xlink:href="pone.0211714.e029.jpg" id="pone.0211714.e029g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M29">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>λ</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
of the S
<sc>hortest</sc>
W
<sc>eighted</sc>
λ-C
<sc>over</sc>
S
<sc>uperstring</sc>
problem, and let
<inline-formula id="pone.0211714.e030">
<alternatives>
<graphic xlink:href="pone.0211714.e030.jpg" id="pone.0211714.e030g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M30">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
. Following [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
], we construct a complete directed edge-weighted graph
<italic>G</italic>
= (
<italic>V</italic>
,
<italic>E</italic>
,
<italic>c</italic>
), called the
<italic>distance graph</italic>
. To distinguish the edge weights from the weights from the input weight function
<italic>w</italic>
, the weights on edges will also be referred to as
<italic>costs</italic>
and will be specified with a function
<inline-formula id="pone.0211714.e031">
<alternatives>
<graphic xlink:href="pone.0211714.e031.jpg" id="pone.0211714.e031g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M31">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>E</mml:mi>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi mathvariant="double-struck">Z</mml:mi>
<mml:mo>+</mml:mo>
</mml:msub>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
. The construction is the same as in [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
]:</p>
<list list-type="bullet">
<list-item>
<p>
<italic>V</italic>
=
<italic>T</italic>
∪ {
<italic>s</italic>
*}.</p>
</list-item>
<list-item>
<p>For every two distinct vertices
<italic>s</italic>
,
<italic>t</italic>
<italic>T</italic>
, add the arc (
<italic>s</italic>
,
<italic>t</italic>
) to
<italic>E</italic>
and assign to it the cost
<italic>c</italic>
(
<italic>s</italic>
,
<italic>t</italic>
) =
<italic></italic>
(
<italic>s</italic>
) −
<italic>ov</italic>
(
<italic>s</italic>
,
<italic>t</italic>
). Clearly, the costs are well defined and non-negative.</p>
</list-item>
<list-item>
<p>For every vertex
<italic>s</italic>
<italic>T</italic>
, add the arc (
<italic>s</italic>
,
<italic>s</italic>
*) to
<italic>E</italic>
and assign to it cost
<italic>c</italic>
(
<italic>s</italic>
,
<italic>s</italic>
*) =
<italic></italic>
(
<italic>s</italic>
).</p>
</list-item>
<list-item>
<p>For every vertex
<italic>s</italic>
<italic>T</italic>
, add the arc (
<italic>s</italic>
*,
<italic>s</italic>
) to
<italic>E</italic>
and assign to it zero cost,
<italic>c</italic>
(
<italic>s</italic>
*,
<italic>s</italic>
) = 0.</p>
</list-item>
</list>
<p>We emphasize that in what follows, we identify the vertices of
<italic>G</italic>
other than
<italic>s</italic>
* with the corresponding strings from
<italic>T</italic>
. In particular, for
<italic>i</italic>
,
<italic>j</italic>
<italic>V</italic>
(
<italic>G</italic>
) \ {
<italic>s</italic>
*}, notation
<italic>i</italic>
<italic>j</italic>
means that
<italic>i</italic>
is a substring of
<italic>j</italic>
and
<italic>i</italic>
<italic>j</italic>
that
<italic>i</italic>
is a proper substring of
<italic>j</italic>
. One more definition is needed to express the problem as a graph problem. A subgraph
<italic>H</italic>
of
<italic>G</italic>
is said to
<italic>cover</italic>
a string
<bold>s</bold>
<italic>T</italic>
if there exists a vertex
<bold>t</bold>
<italic>V</italic>
(
<italic>H</italic>
) ∩
<italic>T</italic>
such that
<bold>s</bold>
<bold>t</bold>
. For
<inline-formula id="pone.0211714.e032">
<alternatives>
<graphic xlink:href="pone.0211714.e032.jpg" id="pone.0211714.e032g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M32">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
, we will denote the set of all strings in
<italic>X</italic>
covered by
<italic>H</italic>
by
<italic>X</italic>
<sub>
<italic>H</italic>
</sub>
. The
<italic>cost</italic>
of a directed cycle
<italic>C</italic>
in
<italic>G</italic>
is defined as ∑
<sub>
<italic>e</italic>
<italic>E</italic>
(
<italic>C</italic>
)</sub>
<italic>c</italic>
(
<italic>e</italic>
).</p>
<p>
<bold>Definition 4</bold>
<italic>A directed cycle C in the distance graph G is said to be w</italic>
-feasible
<italic>if it satisfies the following conditions</italic>
:</p>
<list list-type="order">
<list-item>
<p>
<italic>s</italic>
* ∈
<italic>V</italic>
(
<italic>C</italic>
).</p>
</list-item>
<list-item>
<p>
<italic>For every two distinct vertices</italic>
<bold>s</bold>
,
<bold>t</bold>
<italic>from V</italic>
(
<italic>C</italic>
) ∩
<italic>T</italic>
,
<bold>s</bold>
<italic>is not a substring of</italic>
<bold>t</bold>
.</p>
</list-item>
<list-item>
<p>
<italic>For every</italic>
<inline-formula id="pone.0211714.e033">
<alternatives>
<graphic xlink:href="pone.0211714.e033.jpg" id="pone.0211714.e033g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M33">
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
,
<italic>we have</italic>
<inline-formula id="pone.0211714.e034">
<alternatives>
<graphic xlink:href="pone.0211714.e034.jpg" id="pone.0211714.e034g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M34">
<mml:mrow>
<mml:msub>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">t</mml:mi>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mi>C</mml:mi>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo></mml:mo>
<mml:mo>λ</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
.</p>
</list-item>
</list>
<p>
<bold>Proposition 5</bold>
<italic>Let</italic>
<inline-formula id="pone.0211714.e035">
<alternatives>
<graphic xlink:href="pone.0211714.e035.jpg" id="pone.0211714.e035g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M35">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>λ</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
<italic>be an instance to the</italic>
S
<sc>hortest</sc>
W
<sc>eighted</sc>
λ-C
<sc>over</sc>
S
<sc>uperstring</sc>
<italic>problem, and let G be its derived distance graph. Then, there exists a weighted</italic>
λ-
<italic>cover superstring for</italic>
<inline-formula id="pone.0211714.e036">
<alternatives>
<graphic xlink:href="pone.0211714.e036.jpg" id="pone.0211714.e036g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M36">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
<italic>of length at most ℓ if and only if G contains a w-feasible directed cycle C of cost at most ℓ</italic>
.</p>
<p>We give a proof of Proposition 5 in
<xref ref-type="supplementary-material" rid="pone.0211714.s001">S1 Appendix</xref>
.</p>
<p>Proposition 5 leads to the following IP formulation for the S
<sc>hortest</sc>
W
<sc>eighted</sc>
λ-C
<sc>over</sc>
S
<sc>uperstring</sc>
problem. The program has three types of binary variables:
<italic>x</italic>
<sub>
<italic>ij</italic>
</sub>
, where (
<italic>i</italic>
,
<italic>j</italic>
) ranges over all ordered pairs of distinct elements of
<italic>V</italic>
,
<italic>y</italic>
<sub>
<italic>i</italic>
</sub>
, where
<italic>i</italic>
ranges over all elements of
<italic>V</italic>
, and
<italic>z</italic>
<sub>
<italic>i</italic>
</sub>
, where
<italic>i</italic>
ranges over all elements of
<italic>T</italic>
. Recall that
<inline-formula id="pone.0211714.e037">
<alternatives>
<graphic xlink:href="pone.0211714.e037.jpg" id="pone.0211714.e037g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M37">
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>:</mml:mo>
<mml:mi>E</mml:mi>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi mathvariant="double-struck">R</mml:mi>
<mml:mo>+</mml:mo>
</mml:msub>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
is the cost function on the edges of the distance graph
<italic>G</italic>
.
<disp-formula id="pone.0211714.e038">
<alternatives>
<graphic xlink:href="pone.0211714.e038.jpg" id="pone.0211714.e038g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M38">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mo form="prefix" movablelimits="true">min</mml:mo>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd></mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>.</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:msup>
<mml:mi>s</mml:mi>
<mml:mo>*</mml:mo>
</mml:msup>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mtd>
<mml:mtd></mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mi>V</mml:mi>
<mml:mspace width="0.166667em"></mml:mspace>
<mml:mo>:</mml:mo>
<mml:mspace width="0.166667em"></mml:mspace>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:mo></mml:mo>
<mml:mi>j</mml:mi>
<mml:mo></mml:mo>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo></mml:mo>
<mml:mi>V</mml:mi>
<mml:mspace width="0.166667em"></mml:mspace>
<mml:mo>:</mml:mo>
<mml:mspace width="0.166667em"></mml:mspace>
<mml:mi>j</mml:mi>
<mml:mo></mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:mo></mml:mo>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mi>V</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:mo>λ</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:mo></mml:mo>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:mo></mml:mo>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mi>T</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:mo></mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo></mml:mo>
<mml:mi>T</mml:mi>
<mml:mtext>such</mml:mtext>
<mml:mspace width="2pt"></mml:mspace>
<mml:mtext>that</mml:mtext>
<mml:mspace width="2pt"></mml:mspace>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:mn>0</mml:mn>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>integer</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:mn>0</mml:mn>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>integer</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:mn>0</mml:mn>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mspace width="1.em"></mml:mspace>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>integer</mml:mtext>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
</p>
<p>The feasible solutions of the IP described above are in correspondence with subgraphs
<italic>H</italic>
of
<italic>G</italic>
containing
<italic>s</italic>
* that consist of one or more
<italic>subtours</italic>
(vertex-disjoint directed cycles) in which the vertices other than
<italic>s</italic>
* correspond to a set of strings that are pairwise incomparable with respect to the substring relation and such that the covering requirement
<disp-formula id="pone.0211714.e039">
<alternatives>
<graphic xlink:href="pone.0211714.e039.jpg" id="pone.0211714.e039g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M39">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi mathvariant="bold">t</mml:mi>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="bold">t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo></mml:mo>
<mml:mo>λ</mml:mo>
<mml:mspace width="0.166667em"></mml:mspace>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
is satisfied.</p>
<p>To be able to apply Proposition 5, we are only interested in solutions that consist of a single directed cycle. As discussed in [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
], this can be achieved in several ways (see, e.g., [
<xref rid="pone.0211714.ref017" ref-type="bibr">17</xref>
]), for instance using the Miller-Tucker-Zemlin (MTZ) formulation [
<xref rid="pone.0211714.ref018" ref-type="bibr">18</xref>
], the subtour formulations, or with a combined approach resulting in a cutting-plane algorithm.</p>
<p>In
<xref ref-type="fig" rid="pone.0211714.g001">Fig 1</xref>
, we represent an ilustrative sketch linking the combinatorial optimization problem to the graph problem.</p>
<fig id="pone.0211714.g001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0211714.g001</object-id>
<label>Fig 1</label>
<caption>
<title>Graphical interpretation of the connection of the combinatorial optimization problem to the graph problem.</title>
<p>The clusters associated to the host strings are shown in ovals with the corresponding target strings inside them. Each target string has an associated weight, which is shown in this example using a color code from light blue to strong blue, with extreme values corresponding to 0 and 1, respectively. The λ-superstring is represented with a closed ribbon which travels among the clusters. It is closed because one of the strings forming it corresponds to the artificial vertex
<italic>s</italic>
*, which is not a host string, but can be viewed as an empty string gluing the extremes of the λ-superstring. The condition that for each one the clusters, the sum of the weights of the target strings that are both in the λ-superstring and in the cluster is at least λ is imposed in the feasible solutions. The length of the λ-superstring is minimized, and this length can be obtained by summing up the
<italic>c</italic>
(
<italic>i</italic>
,
<italic>j</italic>
) values of the strings forming the λ-superstring. The
<italic>c</italic>
(
<italic>i</italic>
,
<italic>j</italic>
) values are shown in the figure as the length of the part of the vertex labelled by
<italic>i</italic>
not overlapping with the next vertex in the λ-superstring, which is labelled by
<italic>j</italic>
.</p>
</caption>
<graphic xlink:href="pone.0211714.g001"></graphic>
</fig>
</sec>
<sec id="sec007">
<title>A genetic algorithm</title>
<p>In this section we will present a genetic algorithm well suited to find solutions to a problem with potential applications to vaccine design posed, for unweighted λ-superstrings, in the concluding section of [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
]. The problem is the following: Given a set of host strings of approximately similar lengths corresponding to the same protein with different mutations in a set of patients, find a λ-superstring of about one-gene length with λ as big as possible when the set
<italic>T</italic>
of target strings is formed by all the substrings of a given length
<italic></italic>
of the set of host strings, while keeping, as much as possible, the relative order of the elements in
<italic>T</italic>
. In other words, the goal is to design a synthetic protein enriched in the sense that it covers many epitopes in each host string. In our more general setting of weighted λ-superstrings we require these epitopes to be very immunogenic. As the second objective of our multi-objective optimization program, we have chosen to optimize the amino acid resemblance with the virus peptides. By using the alignment as target to be optimized, we will be able to choose candidates that have a structure similar to those which already interacted with HIV patients, and therefore will likely be recognized by the immune system.</p>
<p>We opt for a genetic algorithm in this case because the high number of target strings makes the use of integer programming impractical; employing heuristic methods of optimization is thus a good alternative. We do sacrifice on optimality; nevertheless, suboptimal solutions can be satisfactory in practice.</p>
<p>We are faced with a multi-objective optimization problem. To solve such problems, multi-objective functions
<italic>f</italic>
:
<italic>P</italic>
<italic>R</italic>
<sup>
<italic>n</italic>
</sup>
are considered, where
<italic>P</italic>
is the set of feasible solutions, that assign to each element
<italic>x</italic>
<italic>P</italic>
an
<italic>n</italic>
-tuple (
<italic>f</italic>
<sub>1</sub>
(
<italic>x</italic>
), …,
<italic>f</italic>
<sub>
<italic>n</italic>
</sub>
(
<italic>x</italic>
)) with real entries, each of which indicates a partial objective function. Without loss of generality, we can assume that we want to
<italic>maximize</italic>
each partial objective function, because minimizing
<italic>f</italic>
<sub>
<italic>i</italic>
</sub>
(
<italic>x</italic>
) is equivalent to maximizing the opposite function −
<italic>f</italic>
<sub>
<italic>i</italic>
</sub>
(
<italic>x</italic>
). Obviously, it is not possible in general to get a solution
<italic>x</italic>
<italic>P</italic>
in which all partial objective functions
<italic>f</italic>
<sub>
<italic>i</italic>
</sub>
attain maximum value. Instead, optimality of a solution is established in terms of
<italic>Pareto domination</italic>
: given two feasible solutions
<italic>x</italic>
,
<italic>y</italic>
<italic>P</italic>
, we say that
<italic>x</italic>
= (
<italic>x</italic>
<sub>1</sub>
, …,
<italic>x</italic>
<sub>
<italic>n</italic>
</sub>
) is dominated by
<italic>y</italic>
= (
<italic>y</italic>
<sub>1</sub>
, …,
<italic>y</italic>
<sub>
<italic>n</italic>
</sub>
) if
<italic>x</italic>
<sub>
<italic>i</italic>
</sub>
<italic>y</italic>
<sub>
<italic>i</italic>
</sub>
for every
<italic>i</italic>
and
<italic>x</italic>
<sub>
<italic>j</italic>
</sub>
<
<italic>y</italic>
<sub>
<italic>j</italic>
</sub>
for some
<italic>j</italic>
. The
<italic>Pareto front</italic>
is formed by the elements in
<italic>P</italic>
which are not dominated by any element of
<italic>P</italic>
.</p>
<p>Very often evolutionary algorithms are used to evolve an initial population
<italic>P</italic>
<sub>0</sub>
<italic>P</italic>
to obtain a sequence
<italic>P</italic>
<sub>
<italic>i</italic>
</sub>
of populations which get closer to the Pareto front, and it is desirable to obtain wide-spread sets of solutions. In particular, several genetic algorithm approaches have been proposed for these kinds of problems. One of the most reliable and quick ones among them is NSGA-II [
<xref rid="pone.0211714.ref011" ref-type="bibr">11</xref>
], and we have used it for our optimization problem. For definitions and results on genetic algorithms we refer the reader to [
<xref rid="pone.0211714.ref019" ref-type="bibr">19</xref>
].</p>
<p>We outline here the structure of the NSGA-II algorithm. We refer to [
<xref rid="pone.0211714.ref011" ref-type="bibr">11</xref>
] for details.</p>
<p>Given a set
<italic>P</italic>
′ ⊆
<italic>P</italic>
of feasible solutions, two key values are assigned to each
<italic>x</italic>
<italic>P</italic>
′: the non-domination rank
<italic>x</italic>
<sub>rank</sub>
and the crowding distance
<italic>x</italic>
<sub>distance</sub>
. The process of assignment of non-domination ranks is as follows. The non-dominated elements, that is, the elements in the Pareto front of
<italic>P</italic>
′ are assigned rank 1, and they form the set
<italic>F</italic>
<sub>1</sub>
. If we take
<italic>P</italic>
′ −
<italic>F</italic>
<sub>1</sub>
, the non-dominated elements in this set are assigned rank 2, and they form the set
<italic>F</italic>
<sub>2</sub>
, and so on. This ordering is done using the fast non-dominated sorting described in [
<xref rid="pone.0211714.ref011" ref-type="bibr">11</xref>
]. The crowded distance
<italic>x</italic>
<sub>distance</sub>
is calculated by taking the average distance of two points on either side of
<italic>x</italic>
along each of the
<italic>n</italic>
objectives. This leads to a strict partial order on
<italic>P</italic>
′ defined by
<disp-formula id="pone.0211714.e040">
<alternatives>
<graphic xlink:href="pone.0211714.e040.jpg" id="pone.0211714.e040g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M40">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo></mml:mo>
<mml:mi>y</mml:mi>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>if</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mtext>rank</mml:mtext>
</mml:msub>
<mml:mo><</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mtext>rank</mml:mtext>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>or</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>if</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mtext>rank</mml:mtext>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mtext>rank</mml:mtext>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>and</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mtext>distance</mml:mtext>
</mml:msub>
<mml:mo>></mml:mo>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mtext>distance</mml:mtext>
</mml:msub>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
</p>
<p>The general process in NSGA-II is as follows:</p>
<p>First, given a parameter
<italic>m</italic>
, a random population
<italic>P</italic>
<sub>0</sub>
of size
<italic>m</italic>
is constructed, and it is sorted according to the relation ≺ defined above. Then, a binary tournament selection is done considering the relation ≺. In the tournament selection it is theoretically possible, although it is unlikely, that two different elements are not comparable with respect to the relation, because they have the same rank and the same crowded distance. In this case, one of them is chosen uniformly at random. After the tournament selection is completed, mutation and crossing is done on the selected elements, to create an offspring population
<italic>Q</italic>
<sub>0</sub>
of size
<italic>m</italic>
. Now a combined population
<italic>R</italic>
<sub>0</sub>
=
<italic>P</italic>
<sub>0</sub>
<italic>Q</italic>
<sub>0</sub>
is formed, and the elements in
<italic>R</italic>
<sub>0</sub>
are sorted according to their domination level. Then, a new population
<italic>P</italic>
<sub>1</sub>
is formed by collecting the elements in
<italic>R</italic>
<sub>0</sub>
in ascending order of ranks, that is, we take the elements in the set
<italic>F</italic>
<sub>1</sub>
formed by the elements of rank 1, then the elements in
<italic>F</italic>
<sub>2</sub>
, and so on, until all the elements of a certain set
<italic>F</italic>
<sub>
<italic>i</italic>
−1</sub>
have been allocated but there is no place to allocate all the elements of
<italic>F</italic>
<sub>
<italic>i</italic>
</sub>
, that is, until |
<italic>F</italic>
<sub>1</sub>
∪ ⋯ ∪
<italic>F</italic>
<sub>
<italic>i</italic>
−1</sub>
|≤
<italic>m</italic>
but |
<italic>F</italic>
<sub>1</sub>
∪ ⋯ ∪
<italic>F</italic>
<sub>
<italic>i</italic>
</sub>
| >
<italic>m</italic>
. Then, we rank the elements in
<italic>F</italic>
<sub>
<italic>i</italic>
</sub>
according to its crowding distance and we select elements in non-increasing order of crowding distance until we have
<italic>m</italic>
elements in
<italic>P</italic>
<sub>1</sub>
. Now, given a parameter
<italic>niter</italic>
, the process is iterated
<italic>niter</italic>
times to obtain a population
<italic>P</italic>
<sub>
<italic>i</italic>
+1</sub>
from a population
<italic>P</italic>
<sub>
<italic>i</italic>
</sub>
for any
<italic>i</italic>
in the same way that we obtained
<italic>P</italic>
<sub>1</sub>
from
<italic>P</italic>
<sub>0</sub>
.</p>
<p>Next we will describe how we use NSGA-II for our particular problem.</p>
<p>We want to find a weighted λ-superstring for a set
<italic>H</italic>
= {
<bold>h</bold>
<sub>1</sub>
, …,
<bold>h</bold>
<sub>
<italic>spop</italic>
</sub>
} of host strings, a set
<italic>T</italic>
of target strings formed by all the subsequences of a given length
<italic></italic>
of the strings of
<italic>H</italic>
, and a weight mapping
<italic>w</italic>
assigning real values to elements of
<italic>T</italic>
. The chromosomes in the genetic algorithm will be sequences of target strings. The phenotype of a chromosome
<italic>u</italic>
will be the overlapping sum
<italic>o</italic>
(
<italic>u</italic>
) of the target strings which constitute it (according to the sequence in which they appear in
<italic>u</italic>
). The fitness function that we consider for each chromosome
<italic>u</italic>
in the population is taken to be
<italic>f</italic>
(
<italic>u</italic>
) = (λ(
<italic>u</italic>
),
<italic>al</italic>
(
<italic>u</italic>
)), where:</p>
<list list-type="bullet">
<list-item>
<p>λ(
<italic>u</italic>
) is an estimate of the maximum value for which
<italic>o</italic>
(
<italic>u</italic>
) is a weighted λ(
<italic>u</italic>
)-superstring for (
<italic>H</italic>
,
<italic>T</italic>
), defined by
<disp-formula id="pone.0211714.e041">
<alternatives>
<graphic xlink:href="pone.0211714.e041.jpg" id="pone.0211714.e041g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M41">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mo>λ</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo form="prefix" movablelimits="true">min</mml:mo>
<mml:mo>{</mml:mo>
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo></mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>substring</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>of</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:msub>
<mml:mtext mathvariant="bold">h</mml:mtext>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>p</mml:mi>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
(it is an estimate because the true value of the maximum λ could, in principle, be different from λ(
<italic>u</italic>
) if there are elements of
<italic>T</italic>
covered by
<italic>o</italic>
(
<italic>u</italic>
) which are not in
<italic>u</italic>
), and</p>
</list-item>
<list-item>
<p>
<italic>al</italic>
(
<italic>u</italic>
) is the average value of the scorings for the pairwise global alignments of
<italic>o</italic>
(
<italic>u</italic>
) and each of the strings
<bold>h</bold>
<sub>
<italic>i</italic>
</sub>
.</p>
</list-item>
</list>
<p>The specific scoring scheme may depend on the application; in the Results section we specify it for our particular biological application. (For background on string alignment, see [
<xref rid="pone.0211714.ref020" ref-type="bibr">20</xref>
]).</p>
<p>We have used a modified version of NSGA-II in which we take the
<italic>Q</italic>
<sub>
<italic>i</italic>
</sub>
sets of a cardinality
<italic>m</italic>
greater than
<italic>spop</italic>
, so that |
<italic>R</italic>
<sub>
<italic>i</italic>
</sub>
| > 2|
<italic>P</italic>
<sub>
<italic>i</italic>
</sub>
| for every
<italic>i</italic>
. Also, instead of taking the initial population
<italic>P</italic>
<sub>0</sub>
randomly, we have taken it to be formed by the sequences of target strings corresponding to the set {
<bold>h</bold>
<sub>1</sub>
, …,
<bold>h</bold>
<sub>
<italic>spop</italic>
</sub>
} of host strings, in the order of appearance in each host string.</p>
<p>For the crossing of two chromosomes
<inline-formula id="pone.0211714.e042">
<alternatives>
<graphic xlink:href="pone.0211714.e042.jpg" id="pone.0211714.e042g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M42">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi></mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
and
<inline-formula id="pone.0211714.e043">
<alternatives>
<graphic xlink:href="pone.0211714.e043.jpg" id="pone.0211714.e043g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M43">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:msub>
<mml:mi></mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
, we have used a one-point crossing in which we select randomly a crossing point
<italic>c</italic>
between 1 and min{
<italic></italic>
<sub>1</sub>
,
<italic></italic>
<sub>2</sub>
} − 1 and take
<inline-formula id="pone.0211714.e044">
<alternatives>
<graphic xlink:href="pone.0211714.e044.jpg" id="pone.0211714.e044g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M44">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:msub>
<mml:mi></mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
as the first child and
<inline-formula id="pone.0211714.e045">
<alternatives>
<graphic xlink:href="pone.0211714.e045.jpg" id="pone.0211714.e045g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M45">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi></mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
as the second child.</p>
<p>Once the crossing has been done, we have assigned a probability of mutation
<italic>prmut</italic>
in each gene of each child chromosome. A mutation in the
<italic>i</italic>
-th position of a chromosome
<inline-formula id="pone.0211714.e046">
<alternatives>
<graphic xlink:href="pone.0211714.e046.jpg" id="pone.0211714.e046g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M46">
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi></mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
is done by selecting first a random integer
<italic>j</italic>
obtained by rounding a real number sampled according to the normal distribution with mean
<italic>i</italic>
and standard deviation defined by a parameter
<italic>sd</italic>
, choosing then uniformly at random a sequence
<inline-formula id="pone.0211714.e047">
<alternatives>
<graphic xlink:href="pone.0211714.e047.jpg" id="pone.0211714.e047g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M47">
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:msub>
<mml:mi></mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
associated to a host string from the initial population and substituting
<italic>u</italic>
<sub>
<italic>i</italic>
</sub>
with
<italic>v</italic>
<sub>
<italic>j</italic>
</sub>
in the chromosome
<italic>u</italic>
if 1 ≤
<italic>j</italic>
<italic></italic>
<sub>2</sub>
; in any other case, the mutation is not done. The idea of this mutation that we have just described is to substitute the
<italic>u</italic>
<sub>
<italic>i</italic>
</sub>
with an element ‘not far from the
<italic>i</italic>
-th position’, in the sense that it is close to an element in the
<italic>i</italic>
-th position in a chromosome of the initial population formed by the host strings.</p>
</sec>
</sec>
<sec sec-type="results" id="sec008">
<title>Results</title>
<p>In this section, an application of the IP-based algorithm and of the genetic algorithm is given to find weighted λ-superstrings for a set of 169 host strings whose GenBank [
<xref rid="pone.0211714.ref021" ref-type="bibr">21</xref>
] access numbers appear in
<xref ref-type="supplementary-material" rid="pone.0211714.s004">S1 Table</xref>
, corresponding to the Nef protein, and two sets of target strings (epitopes) chosen in a way that will be made clear soon. The 169 sequences were from HIV-1 subtype B independently infected individuals, and this specific set was first considered by Nickle et al. in [
<xref rid="pone.0211714.ref022" ref-type="bibr">22</xref>
], and later by our group in [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
]. Thus, we used this same set in order to be able to compare the method here proposed, to our previous work [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
]. This comparison can be found at the end of this section.</p>
<sec id="sec009">
<title>Applying the integer programming formulation</title>
<p>We begin with the IP-based algorithm described in the Materials and methods section. We consider the set of epitopes shown in
<xref ref-type="supplementary-material" rid="pone.0211714.s005">S2 Table</xref>
.</p>
<p>The weights corresponding to the immunogenicities of epitopes were experimentally obtained from the data appearing in the Immune Epitope Database and Analysis Resource (IEDB) [
<xref rid="pone.0211714.ref023" ref-type="bibr">23</xref>
]. We selected the epitopes for the Nef protein satisfying simultaneously the following three conditions:</p>
<list list-type="order">
<list-item>
<p>they are covered by at least one of the 169 host strings analyzed;</p>
</list-item>
<list-item>
<p>they appear in the HIV Molecular Immunology Database [
<xref rid="pone.0211714.ref024" ref-type="bibr">24</xref>
];</p>
</list-item>
<list-item>
<p>they appear in IEDB with a positive value of
<italic>p</italic>
+
<italic>n</italic>
, where
<italic>p</italic>
and
<italic>n</italic>
are the number of positive and negative results, respectively, in the MHC Ligand Assays section.</p>
</list-item>
</list>
<p>We took the ratio
<italic>p</italic>
/(
<italic>p</italic>
+
<italic>n</italic>
) as the weighting of the epitopes. Note that a non-linear rescaling of the weights (i.e., normalizing them) would change the optimization problem. However, we consider that to justify a rescaling we would require empirical evidence pointing that the candidates give better results, and that is out of the scope of this work. The main reason for considering this weighting is that the empirical response of an epitope can only be verified through assays, so we estimated it numerically by the aforementioned ratio. Moreover, we used the MHC Ligand Assays, because there are several works stating that there exists a correlation between the generated immune response and MHC complex stability [
<xref rid="pone.0211714.ref025" ref-type="bibr">25</xref>
] or MHC affinity [
<xref rid="pone.0211714.ref026" ref-type="bibr">26</xref>
], and it has been used to predict T Cell epitopes [
<xref rid="pone.0211714.ref027" ref-type="bibr">27</xref>
]. The values are also shown in
<xref ref-type="supplementary-material" rid="pone.0211714.s005">S2 Table</xref>
.</p>
<p>The solutions found with the IP-based algorithm and the values of the corresponding parameters are shown in Tables
<xref rid="pone.0211714.t001" ref-type="table">1</xref>
and
<xref rid="pone.0211714.t002" ref-type="table">2</xref>
, which we now explain.</p>
<table-wrap id="pone.0211714.t001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0211714.t001</object-id>
<label>Table 1</label>
<caption>
<title>Optimal solutions of minimum length for a given value of λ.</title>
</caption>
<alternatives>
<graphic id="pone.0211714.t001g" xlink:href="pone.0211714.t001"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
</colgroup>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">λ = 1.0</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: TQGYFPDWQNYVPLRPMTYPLTFGWCF</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 27</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Covering value of the solution: 1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">λ = 1.5</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: LTFGWCFKLVFPVRPQVPLRPMTYKAAVDLSHFLK</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 35</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Covering value of the solution: 1.51</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">λ = 1.9</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: KAAVDLSHFLTFGWCFKLVFPVRPQVPLRPMTYTQGYFPDWQNY</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 44</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Covering value of the solution: 1.94</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">λ = 2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: KAAVDLSHFLKLTFGWCFKLVFPVRPQVPLRPMTYTQGYFPDWQNY</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 46</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Covering value of the solution: 2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">λ = 2.5</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: TQGYFPDWQNYPLTFGWCFKLVFPVRPQVPLRPMTYKAAVDLSHFLK</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 47</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Covering value of the solution: 2.51</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">λ = 2.6</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: FPVRPQVPLRPMTYKAAVDLSHFLKEKGGLTQGYFPDWQNYTPGPGVRYPLTFGWCFKLV</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 60</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Covering value of the solution: 2.68</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">λ = 2.9</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: TPGPGVRYPLFPVRPQVPLRPMTYKAAVDLSHFLKTPGPGIRYPLTFGWCFKLVTQGYFPDWQNY</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 65</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Covering value of the solution: 2.94</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">λ = 3.2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: TPGPGIRYPLTPGPGVRYPLTFGWCFKLVPEKEVLVWKFDSRLAFHHQEILDLWVYFPVRPQVPLRPMTYKAAVDLSHFLKEKGGLEGLTQGYFPDWQNY</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 100</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Covering value of the solution: 3.25</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<table-wrap id="pone.0211714.t002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0211714.t002</object-id>
<label>Table 2</label>
<caption>
<title>Optimal solutions with maximum λ for a given upper bound on the length of the string.</title>
</caption>
<alternatives>
<graphic id="pone.0211714.t002g" xlink:href="pone.0211714.t002"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
</colgroup>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Upper bound on string length = 10</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal value of λ = 0.0</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: AVDLSHFL</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 8</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Upper bound on string length = 20</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal value of λ = 0.0</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: AVDLSHFL</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 8</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Upper bound on string length = 30</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal value of λ = 1.0</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: TQGYFPDWQNYPLTFGWCFQVPLRPMTYK</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 29</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Upper bound on string length = 40</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal value of λ = 1.51</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: LTFGWCFKLVFPVRPQVPLRPMTYKAAVDLSHFLKEKGGL</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 40</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Upper bound on string length = 50</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal value of λ = 2.51</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: TQGYFPDWQNYPLTFGWCFKLVFPVRPQVPLRPMTYKAAVDLSHFLK</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 47</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Upper bound on string length = 60</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal value of λ = 2.68</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: TQGYFPDWQNYTPGPGVRYPLTFGWCFKLVFPVRPQVPLRPMTYKAAVDLSHFLKEKGGL</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 60</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Upper bound on string length = 70</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal value of λ = 2.94</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: TPGPGIRYPLTQGYFPDWQNYTPGPGVRYPLTFGWCFKLVFPVRPQVPLRPMTYKAAVDLSHFLKEKGGL</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 70</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Upper bound on string length = 80</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal value of λ = 2.94</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: TPGPGIRYPLTQGYFPDWQNYTPGPGVRYPLTFGWCFKLVFPVRPQVPLRPMTYKAAVDLSHFLKEKGGL</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 70</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Upper bound on string length = 90</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal value of λ = 2.94</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: TPGPGIRYPLTQGYFPDWQNYTPGPGVRYPLTFGWCFKLVFPVRPQVPLRPMTYKAAVDLSHFLKEKGGL</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 70</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Upper bound on string length = 100</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal value of λ = 3.25</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring: QEILDLWVYTQGYFPDWQNYTPGPGIRYPLPEKEVLVWKFDSRLAFHHTPGPGVRYPLTFGWCFKLVFPVRPQVPLRPMTYKAAVDLSHFLKEKGGLEGL</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Optimal λ superstring length: 100</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>In the analysis whose results are shown in
<xref rid="pone.0211714.t001" ref-type="table">Table 1</xref>
, the value of λ was varied from 1.0 up to 3.3 in increments of 0.1, and for each value of λ, the total length of the λ-superstring was minimized. Solutions were obtained by implementing the integer program descrived in Materials and Methods (extended with the MTZ formulation) in Java [
<xref rid="pone.0211714.ref028" ref-type="bibr">28</xref>
] and solving it to optimality using IBM ILOG CPLEX Optimization Studio [
<xref rid="pone.0211714.ref029" ref-type="bibr">29</xref>
]. The integer program corresponding to the case λ = 3.3 turned out to be infeasible; all the others were feasible. In the table we also show the
<italic>covering value</italic>
of the obtained solution, that is, the value of
<inline-formula id="pone.0211714.e048">
<alternatives>
<graphic xlink:href="pone.0211714.e048.jpg" id="pone.0211714.e048g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M48">
<mml:mrow>
<mml:msub>
<mml:mo form="prefix" movablelimits="true">min</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
(using notation from the Materials and methods section). Only the results not dominated by others are shown, in the sense that in cases when for different values of λ the same optimal solution strings were found, only the highest value of λ is shown.</p>
<p>
<xref rid="pone.0211714.t002" ref-type="table">Table 2</xref>
shows the results of a “dual” analysis in which we were maximizing the value of λ subject to imposing an upper bound on the length of a λ-superstring for the given sets of host and target strings. The results were obtained by solving a straightforward modification of integer program (and its extension with the MTZ formulation), again using Java and CPLEX. The modification of the IP consists in treating λ as a variable, replacing the objective function ∑
<sub>
<italic>i</italic>
,
<italic>j</italic>
</sub>
<italic>c</italic>
(
<italic>i</italic>
,
<italic>j</italic>
)
<italic>x</italic>
<sub>
<italic>ij</italic>
</sub>
with λ and min with max, and adding the constraint ∑
<sub>
<italic>i</italic>
,
<italic>j</italic>
</sub>
<italic>c</italic>
(
<italic>i</italic>
,
<italic>j</italic>
)
<italic>x</italic>
<sub>
<italic>ij</italic>
</sub>
<italic></italic>
, where
<italic></italic>
is a given upper bound on the string length. Clearly, since we are maximizing λ, in any optimal solution the value of λ will be equal to the covering value, that is,
<inline-formula id="pone.0211714.e049">
<alternatives>
<graphic xlink:href="pone.0211714.e049.jpg" id="pone.0211714.e049g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M49">
<mml:mrow>
<mml:mo>λ</mml:mo>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mo form="prefix" movablelimits="true">min</mml:mo>
<mml:mrow>
<mml:mi>X</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="script">C</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mi>X</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mi>w</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
(again, using notation from the Materials and methods section).</p>
<p>The upper bound
<italic></italic>
on the length of the λ-superstring was varied from 10 to 200 in increments of 10. Increasing the upper bound on the string length from 100 to anywhere up to 200 did not result in any increase in the covering value λ. We therefore only display in
<xref rid="pone.0211714.t002" ref-type="table">Table 2</xref>
the results for the values of the upper bounds up to 100. Since in this second model the length of the obtained solution was only constrained by an upper bound and not taken into account in the objective function, it should not be surprising that the corresponding solutions found for upper bounds between 100 and 200 were of different lengths, despite the fact of being equally good in terms of their covering values. A similar phenomenon occurred also for values of the upper bound
<italic></italic>
displayed in the table: the optimal covering values of the solutions corresponding to the upper bounds in each of the ranges 10–20 and 70–90 were the same.</p>
<p>We are interested in high covering values while keeping the length of the λ-superstring small. It is therefore interesting to analyze which of the solutions found by the above analysis have the best (that is, highest) ratio between the covering value and the length. In this respect, the best solution found by the above analysis is the λ-superstring of length 47 achieving a covering value of 2.51 (see
<xref rid="pone.0211714.t001" ref-type="table">Table 1</xref>
). The same covering value is also achieved by the string of length 47 shown in
<xref rid="pone.0211714.t002" ref-type="table">Table 2</xref>
. Only slightly worse ratios were achieved by the solutions from the above tables corresponing to the following (length, covering value) pairs: (44, 1.94), (60, 2.68), (65, 2.94) (all from
<xref rid="pone.0211714.t001" ref-type="table">Table 1</xref>
).</p>
<p>Another aspect of such analysis that might be potentially interesting for vaccine design applications would be to identify the maximum possible covering value that can be achieved for a given set of host and target strings (without any restriction on the length of the λ-superstring), and then find a shortest substring realizing this covering value. In the instance analyzed above, this maximum covering value is equal to 3.25, and the shortest length of aλ-superstring achieving this covering value is 100.</p>
</sec>
<sec id="sec010">
<title>Applying the multiobjective genetic algorithm</title>
<p>We used the NSGA-II multiobjective genetic algorithm described in the Materials and methods section for the same set of 169 host strings used in the previous subsection whose GenBank IDs appear in
<xref ref-type="supplementary-material" rid="pone.0211714.s004">S1 Table</xref>
. The set of target strings was taken to be the set of all 9-mers present in the host strings. Unlike in the previous subsection, immunogenicities were not obtained experimentally, because of the technical difficulty and the high cost of estimating empirically the immunogenicity of a large number of sequences. In this case, the immunogenicity associated to each of the target strings (that is, the value of the weight function
<italic>w</italic>
(
<bold>t</bold>
)) was computationally assessed.</p>
<p>Several algorithms to estimate numerically the immunogenicity of epitopes have been proposed in the literature, see, for instance, [
<xref rid="pone.0211714.ref030" ref-type="bibr">30</xref>
<xref rid="pone.0211714.ref039" ref-type="bibr">39</xref>
]. We selected in our analysis the algorithm proposed in [
<xref rid="pone.0211714.ref034" ref-type="bibr">34</xref>
], where a tool was also given in the “T-cell” epitopes-Immunogenicity Prediction” of the “IEDB Analysis Resource” [
<xref rid="pone.0211714.ref040" ref-type="bibr">40</xref>
].</p>
<p>We ran the genetic algorithm by using the program Mathematica [
<xref rid="pone.0211714.ref041" ref-type="bibr">41</xref>
] with the following set of parameters:
<disp-formula id="pone.0211714.e050">
<alternatives>
<graphic xlink:href="pone.0211714.e050.jpg" id="pone.0211714.e050g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M50">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>r</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>500</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>p</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>169</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>01</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1352</mml:mn>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>and</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mi>s</mml:mi>
<mml:mi>d</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
</p>
<p>We used the Mathematica command
<monospace>NeedlemanWunschSimilarity</monospace>
, which gives the number of one-element matches in the alignment, for calculating the scorings of the global alignments that are averaged to obtain the values of
<italic>al</italic>
(
<italic>u</italic>
) described in the Materials and methods section.</p>
<p>We run 20 times the NSGA-II algorithm and collected the non-dominated solutions obtained in each of the runs. We eliminated the dominated solutions to obtain a final estimation of the Pareto front. The values are shown in
<xref ref-type="fig" rid="pone.0211714.g002">Fig 2</xref>
and in
<xref rid="pone.0211714.t003" ref-type="table">Table 3</xref>
. The resultant estimation of the Pareto front gave a set of non-dominated sequences with a maximum λ of 5.71 and a minimum value of 1.2 (average ± SD of 4.32±1.14). The alignments ranged between -88.47 and 163.33 (average± SD of 87.73±67.43). The distributions of λ and the alignment values are represented in
<xref ref-type="supplementary-material" rid="pone.0211714.s003">S1 Fig</xref>
panel (a) and (b), respectively.</p>
<fig id="pone.0211714.g002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0211714.g002</object-id>
<label>Fig 2</label>
<caption>
<title>Estimation of the Pareto front in the genetic algorithm.</title>
<p>The line represents the non-dominated solutions found with the genetic algorithm. The X axis indicates the λ value, while the Y axis indicates the alignment score.</p>
</caption>
<graphic xlink:href="pone.0211714.g002"></graphic>
</fig>
<table-wrap id="pone.0211714.t003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0211714.t003</object-id>
<label>Table 3</label>
<caption>
<title>Numerical values for the estimation of the Pareto front in the genetic algorithm.</title>
</caption>
<alternatives>
<graphic id="pone.0211714.t003g" xlink:href="pone.0211714.t003"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<tbody>
<tr>
<td align="center" rowspan="1" colspan="1">(1.2017,163.33)</td>
<td align="center" rowspan="1" colspan="1">(3.7513,149.3)</td>
<td align="center" rowspan="1" colspan="1">(4.6507,92.16)</td>
<td align="center" rowspan="1" colspan="1">(5.3242,44.29)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(1.293,162.95)</td>
<td align="center" rowspan="1" colspan="1">(3.7547,145.87)</td>
<td align="center" rowspan="1" colspan="1">(4.6771,91.94)</td>
<td align="center" rowspan="1" colspan="1">(5.3374,36.46)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(1.7287,162.79)</td>
<td align="center" rowspan="1" colspan="1">(3.8153,144.98)</td>
<td align="center" rowspan="1" colspan="1">(4.7089,88.77)</td>
<td align="center" rowspan="1" colspan="1">(5.3391,36.19)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(1.8909,162.49)</td>
<td align="center" rowspan="1" colspan="1">(3.855,142.96)</td>
<td align="center" rowspan="1" colspan="1">(4.7308,88.71)</td>
<td align="center" rowspan="1" colspan="1">(5.3413,27.89)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(1.8977,162.44)</td>
<td align="center" rowspan="1" colspan="1">(3.9156,142.07)</td>
<td align="center" rowspan="1" colspan="1">(4.7392,87.72)</td>
<td align="center" rowspan="1" colspan="1">(5.3492,27.59)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(2.0255,161.96)</td>
<td align="center" rowspan="1" colspan="1">(3.9461,138.79)</td>
<td align="center" rowspan="1" colspan="1">(4.7815,85.76)</td>
<td align="center" rowspan="1" colspan="1">(5.3959,26.6)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(2.1794,161.93)</td>
<td align="center" rowspan="1" colspan="1">(3.9526,136.87)</td>
<td align="center" rowspan="1" colspan="1">(4.787,82.31)</td>
<td align="center" rowspan="1" colspan="1">(5.3974,17.13)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(2.5507,161.58)</td>
<td align="center" rowspan="1" colspan="1">(3.9797,135.85)</td>
<td align="center" rowspan="1" colspan="1">(4.8195,82.26)</td>
<td align="center" rowspan="1" colspan="1">(5.4096,16.14)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(2.7806,160.63)</td>
<td align="center" rowspan="1" colspan="1">(4.0195,134.82)</td>
<td align="center" rowspan="1" colspan="1">(4.8925,81.32)</td>
<td align="center" rowspan="1" colspan="1">(5.4404,15.24)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(2.8411,159.48)</td>
<td align="center" rowspan="1" colspan="1">(4.0502,134.62)</td>
<td align="center" rowspan="1" colspan="1">(4.9253,76.17)</td>
<td align="center" rowspan="1" colspan="1">(5.4582,6.15)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(2.9918,159.25)</td>
<td align="center" rowspan="1" colspan="1">(4.08,133.92)</td>
<td align="center" rowspan="1" colspan="1">(4.9355,75.31)</td>
<td align="center" rowspan="1" colspan="1">(5.5322,-0.18)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(2.9923,157.56)</td>
<td align="center" rowspan="1" colspan="1">(4.0977,129.8)</td>
<td align="center" rowspan="1" colspan="1">(5.0094,75.04)</td>
<td align="center" rowspan="1" colspan="1">(5.57,-9.33)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(3.0863,156.8)</td>
<td align="center" rowspan="1" colspan="1">(4.1161,128.93)</td>
<td align="center" rowspan="1" colspan="1">(5.0147,69.88)</td>
<td align="center" rowspan="1" colspan="1">(5.585,-16.12)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(3.1024,153.34)</td>
<td align="center" rowspan="1" colspan="1">(4.1377,128.84)</td>
<td align="center" rowspan="1" colspan="1">(5.0249,68.67)</td>
<td align="center" rowspan="1" colspan="1">(5.5929,-18.14)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(3.1083,153.31)</td>
<td align="center" rowspan="1" colspan="1">(4.1458,125.92)</td>
<td align="center" rowspan="1" colspan="1">(5.078,64.63)</td>
<td align="center" rowspan="1" colspan="1">(5.5988,-19.13)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(3.2322,153.22)</td>
<td align="center" rowspan="1" colspan="1">(4.2259,125.13)</td>
<td align="center" rowspan="1" colspan="1">(5.0834,64.36)</td>
<td align="center" rowspan="1" colspan="1">(5.6403,-21.67)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(3.2357,152.79)</td>
<td align="center" rowspan="1" colspan="1">(4.2286,124.14)</td>
<td align="center" rowspan="1" colspan="1">(5.1092,62.62)</td>
<td align="center" rowspan="1" colspan="1">(5.6454,-30.74)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(3.2449,152.69)</td>
<td align="center" rowspan="1" colspan="1">(4.3204,120)</td>
<td align="center" rowspan="1" colspan="1">(5.1528,62.41)</td>
<td align="center" rowspan="1" colspan="1">(5.6671,-43.62)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(3.4688,152.4)</td>
<td align="center" rowspan="1" colspan="1">(4.4694,116.65)</td>
<td align="center" rowspan="1" colspan="1">(5.1562,48.33)</td>
<td align="center" rowspan="1" colspan="1">(5.6859,-47.61)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(3.4855,150.26)</td>
<td align="center" rowspan="1" colspan="1">(4.47,114.67)</td>
<td align="center" rowspan="1" colspan="1">(5.2502,47.24)</td>
<td align="center" rowspan="1" colspan="1">(5.6998,-72.72)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(3.5152,149.62)</td>
<td align="center" rowspan="1" colspan="1">(4.602,108.9)</td>
<td align="center" rowspan="1" colspan="1">(5.2719,46.02)</td>
<td align="center" rowspan="1" colspan="1">(5.7141,-74.6)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(3.546,149.37)</td>
<td align="center" rowspan="1" colspan="1">(4.6296,98.75)</td>
<td align="center" rowspan="1" colspan="1">(5.2798,45.08)</td>
<td align="center" rowspan="1" colspan="1">(5.717,-88.47)</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>We selected and analyzed in the estimation of the Pareto front the solution with scoring value 161.93 and λ value 2.1794. We have chosen this sequence due to several reasons. First, theλ and the scoring are greater than the ones of all the members in the initial population of 169 strings, for which the mean of the λ values was -1.70395, the maximum λ value was 1.59422, the mean of the scores was 143.34 and the maximum score was 157.66. Second, there is a remarkable level of maintenance of the highly conserved regions of the protein for this solution. Nonetheless, other solutions in the estimation of the Pareto front with greater values of λ and lower scorings could, of course, be useful in practice.</p>
<p>The sequence of amino acids of the selected solution is</p>
<list list-type="simple">
<list-item>
<p>MGGKWSKRSGVGWPTVRERMRRAEPAADGVGAV</p>
</list-item>
<list-item>
<p>SRDLEKHGAITSSNTAATNADCAWLEAQEEEEVGF</p>
</list-item>
<list-item>
<p>PVRPQVPLRPMTYKAAVDLSHFLKEKGGLEGLIYS</p>
</list-item>
<list-item>
<p>QKRQDILDLWIYHTQGYFPDWQNYTPGPGIRYPLT</p>
</list-item>
<list-item>
<p>FGWCFKLVPVEPEKVEEANEGENNSLLHPMSLHG</p>
</list-item>
<list-item>
<p>
<italic>MEDPEKEVLEWKFDSRLAFHHMARELHPEYYKDC</italic>
.</p>
</list-item>
</list>
<p>Since the main goal in this section is to study the structure and functionality of a protein modelled by a sequence with given fixed values of λ and of the average score, we performed several bioinformatics analyses to the string showed in the previous paragraph.</p>
<p>The average value of the lengths of the 169 sequences whose GenBank IDs appear in
<xref ref-type="supplementary-material" rid="pone.0211714.s004">S1 Table</xref>
is 207.11, and the length of our sequence is 206, which is very close to that average. In fact, 206 is the length established for Nef in [
<xref rid="pone.0211714.ref042" ref-type="bibr">42</xref>
], where the distribution of the amino acids of 1643 Nef sequences was analyzed. This does not imply, of course, that the protein has a well-defined length (there are deletions and insertions in certain positions for some of the sequences) and there is not the same amino acid residue for each position in all sequences. Given the high variability of the protein, it is more appropriate to see the protein as a non-deterministic distribution of residues conserving to some extent the secondary and tertiary structures and the functionality.</p>
<p>In order to study to what extent our solution captures the well conserved regions of the protein, we considered the sequences of residues conserved at 90% and their starting positions searching in the table of O’Neill et al. [
<xref rid="pone.0211714.ref042" ref-type="bibr">42</xref>
, Fig 1]. The sequences and positions are shown in
<xref ref-type="supplementary-material" rid="pone.0211714.s006">S3 Table</xref>
.</p>
<p>In our solution, all the sequences appear at the same positions as in the table, so all the oligopeptides conserved at 90% are kept.</p>
<p>In order to analyze the structure of the candidate sequence, we have used the bioinformatics tool I-TASSER [
<xref rid="pone.0211714.ref043" ref-type="bibr">43</xref>
], [
<xref rid="pone.0211714.ref044" ref-type="bibr">44</xref>
], which is an open source software implemented by Zhang Lab—University of Michigan.</p>
<p>Among the available software, we chose I-TASSER because it has ranked several years as the top method in Critical Assessment of protein Structure Prediction (CASP) experiment, a worldwide test which every two years evaluates the protein structure prediction methods proposed by research groups. More precisely, I-TASSER ranked n°1 in CASP7 (2006), CASP8 (2008), CASP9 (2010), CASP10 (2012), CASP11 (2014), and CASP12 (2016).</p>
<p>In short, the method first compares the proposed sequence with the ones in protein databases to identify similar structural templates and align its amino acid sequences. Next, the unaligned sequences are built by
<italic>ab initio</italic>
folding and a simulation of different assemblies with the aligned and unaligned sequences is made by Monte Carlo simulations, creating a set of possible candidates. Then, a selection of the lowest free-energy conformations is made and, starting from this model, a second round of assembly simulation is performed in order to refine the global topology [
<xref rid="pone.0211714.ref045" ref-type="bibr">45</xref>
].</p>
<p>To evaluate the goodness of the predictions, in addition to the TM-Score and the residual RMSD present in the literature, Zhang Lab—University of Michigan has defined a parameter called C-Score. When it is used to evaluate the structural properties, C is typically in the range of [-5,2], where a higher value implies higher confidence in the structure prediction, and models with a C-Score greater than -1.5 are considered reliable predictions.</p>
<p>We analyzed the secondary structure of the candidate sequence. In
<xref ref-type="fig" rid="pone.0211714.g003">Fig 3</xref>
, the secondary structure predicted with I-TASSER for the candidate sequence is displayed. To show the plausibility of the predicted secondary structure, we emphasize that in sequence 2XI1 of Protein Data Bank, which is based in the work by Singh et al. [
<xref rid="pone.0211714.ref046" ref-type="bibr">46</xref>
], a secondary structure for most part of the C-terminal highly conserved domain of HIV1-Nef is showed, in which there is a high level of agreement with our prediction. Residues 149-178 are disordered in the crystal structure obtained in [
<xref rid="pone.0211714.ref046" ref-type="bibr">46</xref>
], and hence in that region the sequence is recorded but no coordinates are determined. In 2XI1 the following substructures appear:</p>
<list list-type="simple">
<list-item>
<p>alpha helix: 83–95</p>
</list-item>
<list-item>
<p>alpha helix: 106–120</p>
</list-item>
<list-item>
<p>beta strand: 145–149</p>
</list-item>
<list-item>
<p>beta strand: 183–187</p>
</list-item>
<list-item>
<p>3/10 helix: 189–192</p>
</list-item>
<list-item>
<p>alpha helix: 196–200,</p>
</list-item>
</list>
<p>which are in good agreement with the structure predicted by I-TASSER.</p>
<fig id="pone.0211714.g003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0211714.g003</object-id>
<label>Fig 3</label>
<caption>
<title>Prediction of the secondary structure of the candidate sequence by I-TASSER.</title>
<p>In this graph we represent the amino acid sequence of our candidate, and below, the secondary structure associated to each AA predicted with I-TASSER. Here, H indicates Helix, S Strand, and C Coil.</p>
</caption>
<graphic xlink:href="pone.0211714.g003"></graphic>
</fig>
<p>We also analyzed the tertiary structure of the candidate sequence, which is represented graphically in
<xref ref-type="fig" rid="pone.0211714.g004">Fig 4</xref>
, panel (a). The prediction obtained for our candidate is highly reliable, since the C-Score of the model is 1.42, and the cutoff value to consider a good prediction is -1.5. Besides, the predicted structure is very similar to the one observed in the Nef protein 3TB8 of Protein Data Bank. This similarity achieved a TM-score of 0.896 in I-TASSER. The TM-score scales the structural similarity between two protein structures. The TM-score ranges on a scale from 0 to 1, with 1 denoting a perfect match and where a scoring greater than 0.5 means that it assumes generally the same fold [
<xref rid="pone.0211714.ref047" ref-type="bibr">47</xref>
].</p>
<fig id="pone.0211714.g004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0211714.g004</object-id>
<label>Fig 4</label>
<caption>
<title>Tertiary structures by I-tasser and Phyre-2.</title>
<p>In this molecular graph we illustrate the resemblance between the tertiary structure of (a) the candidate with I-Tasser, (b) the candidate with Phyre-2, and (c) the sequence 2XI1 with Phyre-2.</p>
</caption>
<graphic xlink:href="pone.0211714.g004"></graphic>
</fig>
<p>In addition to the analysis done with I-TASSER, we have used Phyre2 [
<xref rid="pone.0211714.ref048" ref-type="bibr">48</xref>
] web portal for protein folding to estimate the structure of the candidate. In
<xref ref-type="fig" rid="pone.0211714.g004">Fig 4</xref>
, panel (b) we illustrate the tertiary structure obtained by Phyre2. It can be observed that the folding is very similar to the one obtained by I-TASSER, depicted in
<xref ref-type="fig" rid="pone.0211714.g004">Fig 4</xref>
, panel (a). Moreover, results of Phyre2 indicate that 98% of the residues were modeled with a confidence >90%, using as model the 3TB8 protein, which is the same that I-TASSER used as model for its predictions. Therefore, in this case, both predictions coincide, which reinforces the likelihood that the candidate will fold as predicted.</p>
<p>Additionally, we have studied the sequence 2XI1 with Phyre2. As in the prediction of the candidate with Phyre2, the main model to estimate the tertiary structure of 2XI1 is the protein 3TB8, with 94% of the residues modeled with a confidence >90% using this template. In
<xref ref-type="fig" rid="pone.0211714.g004">Fig 4</xref>
, panel (c) we illustrate the predicted folding of the sequence 2XI1 by Phyre2, which is very similar to the one obtained with the candidate by using Phyre2, and even more similar to the folding obtained by I-TASSER.</p>
<p>Finally, we can see that the folding predictions done with I-TASSER and Phyre2 were based in the same protein (3TB8) and were very similar (see
<xref ref-type="fig" rid="pone.0211714.g004">Fig 4</xref>
).</p>
<p>We did also a BLAST [
<xref rid="pone.0211714.ref049" ref-type="bibr">49</xref>
] search of the candidate sequence, and we obtained that the five most similar sequences to the candidate sequence were the following ones:</p>
<list list-type="order">
<list-item>
<p>
<bold>AAX86040.1</bold>
, with a total score of 420 and an identity of 97%. It corresponds to a synthetic construct of a HIV-1 Clade B consensus Nef protein presented in [
<xref rid="pone.0211714.ref050" ref-type="bibr">50</xref>
], where Kavanagh et al. transfected antigen-presenting cells (APCs) with mRNA encoding Gag-p24 and cytoplasmic, lysosomal, and secreted forms of Nef. They found that transfection of APCs with a Nef construct bearing lysosomal targeting sygnals produced rapid and prolongued antigen presentation to CD4
<sup>+</sup>
and CD8
<sup>+</sup>
T cells [
<xref rid="pone.0211714.ref050" ref-type="bibr">50</xref>
].</p>
</list-item>
<list-item>
<p>
<bold>AAX39503.1</bold>
, with a total score of 418 and an identity of 97%. It corresponds to a synthetic construct of a consensus Nef protein, which was used in [
<xref rid="pone.0211714.ref051" ref-type="bibr">51</xref>
], along with other sequences, to validate the FATT-CTL assay, which is a novel method for the measurement of CTL-mediated cytotoxicity.</p>
</list-item>
<list-item>
<p>
<bold>AAA87523.1</bold>
, with a total score of 416 and an identity of 95% and
<bold>AAA87527.1</bold>
, with a total score of 415 and an identity of 94%. They corresponds to 2 of the 88 sequences of Nef protein of HIV-I, analyzed by Michael et al. in [
<xref rid="pone.0211714.ref052" ref-type="bibr">52</xref>
].</p>
</list-item>
<list-item>
<p>
<bold>AAA63871.1</bold>
, with a total score of 414 and an identity of 94%. It corresponds to 1 of the 90 sequences of a Nef protein of HIV-I, analyzed by Huang, Zhang, and Ho in [
<xref rid="pone.0211714.ref053" ref-type="bibr">53</xref>
].</p>
</list-item>
</list>
<p>In
<xref ref-type="fig" rid="pone.0211714.g005">Fig 5</xref>
, we depict the alignments of the candidate with the five sequences for the BLAST analysis. When the residues were identical, they were shaded in black; if they were not identical but at least similar, they were colored in grey; finally, when there were no similarities among residuals, they were shaded in white.</p>
<fig id="pone.0211714.g005" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0211714.g005</object-id>
<label>Fig 5</label>
<caption>
<title>BLAST alignment of the candidate with AAX86040.1, AAX39503.1, AAA87523.1, AAA87527.1, and AAA63871.1.</title>
<p>In this graph we depict the alignments of the candidate with the five sequences for the BLAST analysis (AAX86040.1, AAX39503.1, AAA87523.1, AAA87527.1, and AAA63871.1). When the residues were identical, they were shaded in black; if they were not identical but at least similar, they were colored in grey; finally, when there were no similarities among residuals, they were shaded in white.</p>
</caption>
<graphic xlink:href="pone.0211714.g005"></graphic>
</fig>
<p>In addition, we used VaxiJen [
<xref rid="pone.0211714.ref054" ref-type="bibr">54</xref>
], which is a server for alignment-independent prediction of protective antigens. It uses bacterial, viral and tumour protein datasets to derive models for prediction of whole protein antigenicity. With our candidate sequence the overall prediction for the antigen obtained with VaxiJen selecting “Virus” as target organism was 0.6895 (Probable antigen). The threshold value to be considered probable antigen was 0.4. For more information about VaxiJen, we refer the reader to [
<xref rid="pone.0211714.ref055" ref-type="bibr">55</xref>
].</p>
<p>The overall predictions obtained in VaxiJen for the 5 strings closer to our candidate sequence in the BLAST search were:</p>
<list list-type="simple">
<list-item>
<p>0.6380 for AAX86040.1</p>
</list-item>
<list-item>
<p>0.6409 for AAX39503.1</p>
</list-item>
<list-item>
<p>0.6688 for AAA87523.1</p>
</list-item>
<list-item>
<p>0.6747 for AAA87527.1</p>
</list-item>
<list-item>
<p>0.6599 for AAA63871.1</p>
</list-item>
</list>
<p>Next, we have compared our candidate with other sequences obtained by three different algorithms. The first is one of the candidates obtained with the unweighted algorithm in our previous work [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
] (we selected among our solutions the candidate with the number of amino acids closest to 206, i.e., closest to the length established for Nef [
<xref rid="pone.0211714.ref042" ref-type="bibr">42</xref>
], but without exceeding this number); the second was obtained by using LANL’s Epigraph [
<xref rid="pone.0211714.ref056" ref-type="bibr">56</xref>
]; and the third was a consensus sequence obtained by LANL’s Consensus [
<xref rid="pone.0211714.ref057" ref-type="bibr">57</xref>
].</p>
<p>In
<xref rid="pone.0211714.t004" ref-type="table">Table 4</xref>
, the resultant estimated Class I immunogenicity [
<xref rid="pone.0211714.ref040" ref-type="bibr">40</xref>
] and mismatch proportion for the four strings can be found. As expected, the estimated immunogenicity value of our weighted candidate was better than the ones of the other three, suggesting that it would generate a more immunogenic response. The mismatch proportion was very similar (near to 0.511) between the weighted, epigraph, and consensus candidates. This result was expected, since we chose a candidate with high alignment, which implies a smaller number of mismatches, and both epigraph and consensus methods are expected to resemble the natural proteins [
<xref rid="pone.0211714.ref056" ref-type="bibr">56</xref>
,
<xref rid="pone.0211714.ref057" ref-type="bibr">57</xref>
]. Finally, since the unweighted candidate did not take into account the alignment, it scored a very high mismatch ratio (equal to 1).</p>
<table-wrap id="pone.0211714.t004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0211714.t004</object-id>
<label>Table 4</label>
<caption>
<title>Comparison between the weighted, unweighted, epigraph, and consensus candidates.</title>
</caption>
<alternatives>
<graphic id="pone.0211714.t004g" xlink:href="pone.0211714.t004"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="center" rowspan="1" colspan="1"></th>
<th align="center" rowspan="1" colspan="1">Class I immunogenicity</th>
<th align="center" rowspan="1" colspan="1">mismatch average</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" rowspan="1" colspan="1">Weighted</td>
<td align="char" char="." rowspan="1" colspan="1">1.8685</td>
<td align="center" rowspan="1" colspan="1">0.5115</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">Unweighted</td>
<td align="char" char="." rowspan="1" colspan="1">1.8409</td>
<td align="center" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">Epigraph</td>
<td align="char" char="." rowspan="1" colspan="1">1.2307</td>
<td align="center" rowspan="1" colspan="1">0.5114</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">Consensus</td>
<td align="char" char="." rowspan="1" colspan="1">1.4103</td>
<td align="center" rowspan="1" colspan="1">0.5109</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>For the purpose of comparison, we have used also a hill-climbing algorithm, as we did in [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
]. In this case we used a multi-objective hill climbing algorithm analogous to the one described in [
<xref rid="pone.0211714.ref058" ref-type="bibr">58</xref>
]. As we did in the Materials and methods section, we considered sequences
<italic>u</italic>
of target strings and the corresponding phenotypes
<italic>o</italic>
(
<italic>u</italic>
) obtained by taking the overlapping sum of the strings in
<italic>u</italic>
. We selected randomly 10 sequences
<inline-formula id="pone.0211714.e051">
<alternatives>
<graphic xlink:href="pone.0211714.e051.jpg" id="pone.0211714.e051g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M51">
<mml:mrow>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mn>10</mml:mn>
</mml:msub>
</mml:msub>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
from the set of host strings and the corresponding sequences
<inline-formula id="pone.0211714.e052">
<alternatives>
<graphic xlink:href="pone.0211714.e052.jpg" id="pone.0211714.e052g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M52">
<mml:mrow>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mn>10</mml:mn>
</mml:msub>
</mml:msub>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
of target strings, and for each sequence
<inline-formula id="pone.0211714.e053">
<alternatives>
<graphic xlink:href="pone.0211714.e053.jpg" id="pone.0211714.e053g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M53">
<mml:msub>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:msub>
</mml:math>
</alternatives>
</inline-formula>
we performed the following procedure:</p>
<p>First, we initialized to
<inline-formula id="pone.0211714.e054">
<alternatives>
<graphic xlink:href="pone.0211714.e054.jpg" id="pone.0211714.e054g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M54">
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:msub>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:msub>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:math>
</alternatives>
</inline-formula>
the set
<italic>ND</italic>
<sub>
<italic>i</italic>
</sub>
of non-dominated solutions. Then, we tried to simulate mutations sequentially in positions of the sequences in
<italic>ND</italic>
<sub>
<italic>i</italic>
</sub>
, by replacing a target string by another target string at the same position in some of the host strings
<italic>h</italic>
<sub>1</sub>
, …,
<italic>h</italic>
<sub>169</sub>
. If at some point we get a sequence
<italic>u</italic>
′ non-dominated for some sequence in
<italic>ND</italic>
<sub>
<italic>i</italic>
</sub>
, then we add the sequence
<italic>u</italic>
′ to the set
<italic>ND</italic>
<sub>
<italic>i</italic>
</sub>
and we repeat the process from the beginning. Instead of repeating this process until no new non-dominated sequence is found, due to the excessive time to required to attain this, we simulated a total of 10
<sup>6</sup>
mutations.</p>
<p>We took the union of the non-dominated sets
<italic>ND</italic>
<sub>1</sub>
, …,
<italic>ND</italic>
<sub>10</sub>
and selected the phenotypes of the non-dominated elements in this union as an approximation to the true Pareto front, which is shown in
<xref ref-type="fig" rid="pone.0211714.g006">Fig 6</xref>
and in
<xref rid="pone.0211714.t005" ref-type="table">Table 5</xref>
. The approximation to the Pareto front is worse than the one shown in
<xref ref-type="fig" rid="pone.0211714.g002">Fig 2</xref>
, obtained by using the genetic algorithm, in the sense that every solution shown in
<xref ref-type="fig" rid="pone.0211714.g006">Fig 6</xref>
is dominated by at least one solution in
<xref ref-type="fig" rid="pone.0211714.g002">Fig 2</xref>
.</p>
<fig id="pone.0211714.g006" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0211714.g006</object-id>
<label>Fig 6</label>
<caption>
<title>Estimation of the Pareto front in the hill-climbing algorithm.</title>
<p>The line represents the non-dominated solutions found with the multi-objective hill climbing algorithm. The X axis indicates the λ value, while the Y axis indicates the alignment score.</p>
</caption>
<graphic xlink:href="pone.0211714.g006"></graphic>
</fig>
<table-wrap id="pone.0211714.t005" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0211714.t005</object-id>
<label>Table 5</label>
<caption>
<title>Numerical values for the estimation of the Pareto front in the hill-climbing algorithm.</title>
</caption>
<alternatives>
<graphic id="pone.0211714.t005g" xlink:href="pone.0211714.t005"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
</colgroup>
<tbody>
<tr>
<td align="center" rowspan="1" colspan="1">(-1.2852,146.68)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.10136,143.982)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.10365,139.213)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.12965,138.432)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.18779,138.349)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.21379,137.811)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.25488,137.55)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.26566,136.87)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.30675,135.87)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.32028,134.876)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.63875,131.544)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.76386,131.183)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.78531,127.781)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.80699,125.817)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.82376,122.917)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.84713,120.538)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(0.959,119.663)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(1.02748,117.893)</td>
</tr>
<tr>
<td align="center" rowspan="1" colspan="1">(1.03195,107.686)</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</sec>
</sec>
<sec sec-type="conclusions" id="sec011">
<title>Discussion</title>
<p>In this paper, we generalized the notion of λ-superstrings from [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
] to the weighted case. We developed an exact algorithm for a corresponding combinatorial optimization problem based on integer programming, extending the model from the previous paper by introducing a weight function on the target strings (which can take both positive and negative values). We consider that weighted λ-superstring criterion could be useful to fight the high mutability and escape mutations of viruses like HIV, HCV, or Influenza, since it gives a balanced protection against all the variants considered, by ensuring that at the overall the immunogenicity of the epitopes in each variant is at least λ. We also described a model taking into account good pairwise alignments of the obtained superstring with the host strings, and presented a heuristic approach based on a multi-objective genetic algorithm. By considering the alignment as a target to optimize by our algorithm, the weighted λ-superstrings obtained by using the genetic algorithm correspond to pseudoproteins structurally similar to the original ones taken from the patients, instead of being just epitope aggregates, opening the doors to possible improvements in the current methodology of epitope vaccine design.</p>
<p>In order to evaluate the performance of our algorithm, first, we analyzed the estimation of the Pareto front obtained with a multi-objective hill-climbing algorithm, which gave worse solutions than the one obtained by the genetic algorithm. Then, we selected a vaccine candidate from the Pareto front and studied its effectiveness
<italic>in silico</italic>
. Due to the weighted λ-superstring condition, and the positive λ value, this pseudo-protein would likely protect against all virus variants considered. Besides, VaxiJen analysis corroborated that the vaccine would be a probable antigen. Next, the structure and resemblance to the native protein were evaluated by several bioinformatic tools (such as Blast, Phyre 2 or I-Tasser), which indicated that our candidate was very similar to HIV-1 2XI1 and 3TB8 sequences. Then, we performed a comparison among our weighted candidate, one of the candidates obtained with the unweighted algorithm in our previous work [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
], a candidate obtained by using LANL’s Epigraph, and a consensus sequence. In this analysis, we observed that the mismatch proportion was worse in the unweighted candidate, which was expected, since the algorithm in [
<xref rid="pone.0211714.ref010" ref-type="bibr">10</xref>
] did not optimize the alignment. Besides, the estimated Class I immunogenicity [
<xref rid="pone.0211714.ref040" ref-type="bibr">40</xref>
] of the weighted candidate was bigger than the estimated immunogenicity for the candidates found with other methods, suggesting that it would generate a more immunogenic response.</p>
<p>Additionally, in order to study the sensitivity of the method, we have also analyzed D and G HIV subtypes, and they yielded similar results, indicating that the method is robust. These analyses can be found in
<xref ref-type="supplementary-material" rid="pone.0211714.s002">S2 Appendix</xref>
.</p>
<p>An important point of future work on weighted λ-superstrings is to determine the extent of practical applicability of the presented models and algorithms to vaccine design, in particular to assess the immunological value of the resulting candidate vaccines. In this regard, we have recently described a functional method to decipher T-cell epitopes of the bacterial and human pathogen
<italic>Listeria monocytogenes (Listeria)</italic>
based on combination of bioinformatics predictions of epitopes binding to MHC molecules and functional assays [
<xref rid="pone.0211714.ref059" ref-type="bibr">59</xref>
]. Our hypothesis was based in the use of two
<italic>Listeria</italic>
antigens, the listeriolysin O (LLO) and the glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) that elicits strong CD4+ and CD8+ T cell responses [
<xref rid="pone.0211714.ref060" ref-type="bibr">60</xref>
], [
<xref rid="pone.0211714.ref061" ref-type="bibr">61</xref>
]. Our method to test vaccine candidates was based in the use of predicted peptides from the bioinformatics analysis to activate dendritic cells
<italic>in vitro</italic>
and elicit high delayed T hypersensitivity (DTH) responses
<italic>in vivo</italic>
, combined to measurements of IL-12 production as the cytokine that best correlates with immune protection.</p>
<p>In order to adapt the methodology just described to the framework of vaccine design using weighted λ-superstrings, we will use in future work the full-length sequence of LLO for the thirteen recognized serotypes of
<italic>Listeria Monocytogenes</italic>
to design B and T-cell epitope vaccines applying weigthed λ-superstrings that gather the genetic diversity of the pathogen by means of the consideration of the different serotypes, and we will compare the epitopes obtained with those of previous studies. Next, we will use the weighted λ-superstrings obtained with the selected epitopes in our functional method of vaccine candidates testing. Finally, our success in predicting efficient LLO epitopes for vaccination and the construction of the subsequent λ-superstrings will be relevant for other intracellular bacteria for which we currently lack available vaccines, such as
<italic>Mycobacterium tuberculosis, Salmonella enteritidis</italic>
, or
<italic>Chlamydia trachomatis</italic>
, among others.</p>
<p>One of the lines considered as future work is to evaluate if the algorithm gives better results when we consider near-matches of the epitopes instead of exact matches, by changing the fitness function. By this approach, we would obtain vaccine candidates that induce cross-reactive T-Cells, which could be activated during the infection of an unrelated heterologous virus. Cross-reaction and its benefits have been widely observed in several infections [
<xref rid="pone.0211714.ref062" ref-type="bibr">62</xref>
,
<xref rid="pone.0211714.ref063" ref-type="bibr">63</xref>
], and since their positive effects in vaccination is promising [
<xref rid="pone.0211714.ref064" ref-type="bibr">64</xref>
,
<xref rid="pone.0211714.ref065" ref-type="bibr">65</xref>
], we consider that this approach might enhance the effectiveness of our method.</p>
<p>Moreover, we would like to consider, besides the weights corresponding to immunogenicities, other kinds of weights at the same time, addressing different biologically motivated goals with different weights. For example, one could consider weights associated to the relative frequencies of the epitopes.</p>
<p>In summary, here we have presented two algorithms for computational vaccine design. Our results indicate that with this methodology, we can obtain weighted λ-superstrings that resemble native protein structures, and protect well-balancedly against the whole group of considered virus variants, minimizing the chances of perpetuating the infection due to escape mutations.</p>
</sec>
<sec sec-type="supplementary-material" id="sec012">
<title>Supporting information</title>
<supplementary-material content-type="local-data" id="pone.0211714.s001">
<label>S1 Appendix</label>
<caption>
<title>Mathematical proof.</title>
<p>(PDF)</p>
</caption>
<media xlink:href="pone.0211714.s001.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0211714.s002">
<label>S2 Appendix</label>
<caption>
<title>Additional sub-analyses.</title>
<p>(PDF)</p>
</caption>
<media xlink:href="pone.0211714.s002.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0211714.s003">
<label>S1 Fig</label>
<caption>
<title>Distribution of the values in the Pareto front.</title>
<p>Histograms representing the frequencies of the λ (a) and alignment (b) values of the estimation of the Pareto front obtained with the genetic algorithm. The Y axis represents the frequency, while the X axis indicates the λ value (a) and the alignment score (b).</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pone.0211714.s003.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0211714.s004">
<label>S1 Table</label>
<caption>
<title>GenBank IDs of the sequences for the Nef protein.</title>
<p>(PDF)</p>
</caption>
<media xlink:href="pone.0211714.s004.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0211714.s005">
<label>S2 Table</label>
<caption>
<title>Experimental values of the immunogenicities of the epitopes.</title>
<p>(PDF)</p>
</caption>
<media xlink:href="pone.0211714.s005.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0211714.s006">
<label>S3 Table</label>
<caption>
<title>Positions and sequences of the conserved regions for the Nef protein.</title>
<p>(PDF)</p>
</caption>
<media xlink:href="pone.0211714.s006.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>Supported in part by the Basque Government, grants IT753-13 and IT974-16 and by the UPV/EHU and Basque Center of Applied Mathematics, grant US18/21. Supported in part by the Slovenian Research Agency (I0-0035, research program P1-0285, and research projects N1-0032, J1-7051, and J1-9110). Technical and human support provided by IZO-SGI, SGIker (UPV/EHU, MICINN, GV/EJ, ERDF and ESF) is gratefully acknowledged. We would like to thank the referees for their helpful suggestions.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pone.0211714.ref001">
<label>1</label>
<mixed-citation publication-type="journal">
<name>
<surname>Nielsen</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Lund</surname>
<given-names>O</given-names>
</name>
,
<name>
<surname>Buus</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Lundegaard</surname>
<given-names>C</given-names>
</name>
.
<article-title>MHC class II epitope predictive algorithms</article-title>
.
<source>Immunology</source>
.
<year>2010</year>
;
<volume>130</volume>
:
<fpage>319</fpage>
<lpage>328</lpage>
.
<pub-id pub-id-type="doi">10.1111/j.1365-2567.2010.03268.x</pub-id>
<pub-id pub-id-type="pmid">20408898</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref002">
<label>2</label>
<mixed-citation publication-type="journal">
<name>
<surname>Khan</surname>
<given-names>AM</given-names>
</name>
,
<name>
<surname>Miotto</surname>
<given-names>O</given-names>
</name>
,
<name>
<surname>Heiny</surname>
<given-names>AT</given-names>
</name>
,
<name>
<surname>Salmon</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Srinivasan</surname>
<given-names>KN</given-names>
</name>
,
<name>
<surname>Nascimento</surname>
<given-names>EJ</given-names>
</name>
,
<etal>et al</etal>
<article-title>A systematic bioinformatics approach for selection of epitope-based vaccine targets</article-title>
.
<source>Cell Immunol</source>
.
<year>2006</year>
;
<volume>244</volume>
:
<fpage>141</fpage>
<lpage>147</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.cellimm.2007.02.005</pub-id>
<pub-id pub-id-type="pmid">17434154</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref003">
<label>3</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>HW</given-names>
</name>
,
<name>
<surname>Lin</surname>
<given-names>YC</given-names>
</name>
,
<name>
<surname>Pai</surname>
<given-names>TW</given-names>
</name>
,
<name>
<surname>Chang</surname>
<given-names>HT</given-names>
</name>
.
<article-title>Prediction of B-cell linear epitopes with a combination of support vector machine classification and amino acid propensity identification</article-title>
.
<source>Biomed Res Int</source>
.
<year>2011</year>
;</mixed-citation>
</ref>
<ref id="pone.0211714.ref004">
<label>4</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hemmer</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Kondo</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Gran</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Pinilla</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Cortese</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Pascal</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
<article-title>Minimal peptide length requirements for CD4+ T cell clones—implications for molecular mimicry and T cell survival</article-title>
.
<source>Int Immunol</source>
.
<year>2000</year>
;
<volume>12</volume>
:
<fpage>375</fpage>
<lpage>383</lpage>
.
<pub-id pub-id-type="doi">10.1093/intimm/12.3.375</pub-id>
<pub-id pub-id-type="pmid">10700472</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref005">
<label>5</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sharon</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Rynkiewicz</surname>
<given-names>MJ</given-names>
</name>
,
<name>
<surname>Lu</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Yang</surname>
<given-names>CY</given-names>
</name>
.
<article-title>Discovery of protective B-cell epitopes for development of antimicrobial vaccines and antibody therapeutics</article-title>
.
<source>Immunology</source>
.
<year>2014</year>
;
<volume>142</volume>
:
<fpage>1</fpage>
<lpage>23</lpage>
.
<pub-id pub-id-type="doi">10.1111/imm.12213</pub-id>
<pub-id pub-id-type="pmid">24219801</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref006">
<label>6</label>
<mixed-citation publication-type="journal">
<name>
<surname>El-manzalawy</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Dobbs</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Honavar</surname>
<given-names>V</given-names>
</name>
.
<article-title>Predicting flexible length linear B-cell epitopes</article-title>
.
<source>Computational Systems Bioinformatics</source>
.
<year>2008</year>
;
<volume>7</volume>
:
<fpage>121</fpage>
<lpage>132</lpage>
.
<pub-id pub-id-type="doi">10.1142/9781848162648_0011</pub-id>
<pub-id pub-id-type="pmid">19642274</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref007">
<label>7</label>
<mixed-citation publication-type="journal">
<name>
<surname>Geginat</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Schenk</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Skoberne</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Goebel</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Hof</surname>
<given-names>H</given-names>
</name>
.
<article-title>A novel approach of direct ex vivo epitope mapping identifies dominant and subdominant CD4 and CD8 T cell epitopes from Listeria monocytogenes</article-title>
.
<source>The Journal of Immunology</source>
.
<year>2001</year>
;
<volume>166</volume>
:
<fpage>1877</fpage>
<lpage>1884</lpage>
.
<pub-id pub-id-type="doi">10.4049/jimmunol.166.3.1877</pub-id>
<pub-id pub-id-type="pmid">11160235</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref008">
<label>8</label>
<mixed-citation publication-type="journal">
<name>
<surname>Skoberne</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Geginat</surname>
<given-names>G</given-names>
</name>
.
<article-title>Efficient in vivo presentation of Listeria monocytogenes-derived CD4 and CD8 T cell epitopes in the absence of IFN-
<italic>γ</italic>
</article-title>
.
<source>The Journal of Immunology</source>
.
<year>2002</year>
;
<volume>168</volume>
:
<fpage>1854</fpage>
<lpage>1860</lpage>
.
<pub-id pub-id-type="doi">10.4049/jimmunol.168.4.1854</pub-id>
<pub-id pub-id-type="pmid">11823519</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref009">
<label>9</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kim</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Ponomarenko</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Zhu</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Tamang</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Greenbaum</surname>
<given-names>J</given-names>
</name>
,
<etal>et al</etal>
<article-title>Immune epitope database analysis resource</article-title>
.
<source>Nucleic Acids Res</source>
.
<year>2012</year>
;
<volume>40</volume>
:
<fpage>525</fpage>
<lpage>530</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gks438</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref010">
<label>10</label>
<mixed-citation publication-type="journal">
<name>
<surname>Martinez</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Milanic</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Legarreta</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Medvedev</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Malaina</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>De la Fuente</surname>
<given-names>IM</given-names>
</name>
.
<article-title>A combinatorial approach to the design of vaccines</article-title>
.
<source>J Math Biol</source>
.
<year>2015</year>
;
<volume>70</volume>
:
<fpage>1327</fpage>
<lpage>1358</lpage>
.
<pub-id pub-id-type="doi">10.1007/s00285-014-0797-4</pub-id>
<pub-id pub-id-type="pmid">24859149</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref011">
<label>11</label>
<mixed-citation publication-type="journal">
<name>
<surname>Deb</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Pratap</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Agarwal</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Meyarivan</surname>
<given-names>T</given-names>
</name>
.
<article-title>A fast and elitist multiobjective genetic algorithm: NSGA-II</article-title>
.
<source>IEEE T Evol Comput</source>
.
<year>2002</year>
;
<volume>6</volume>
:
<fpage>182</fpage>
<lpage>197</lpage>
.
<pub-id pub-id-type="doi">10.1109/4235.996017</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref012">
<label>12</label>
<mixed-citation publication-type="journal">
<name>
<surname>Manzourolajdad</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Gonzalez</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Spouge</surname>
<given-names>JL</given-names>
</name>
.
<article-title>Changes in the Plasticity of HIV-1 Nef RNA during the Evolution of the North American Epidemic</article-title>
.
<source>PloS One</source>
.
<year>2016</year>
;
<volume>11</volume>
:
<fpage>e0163688</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0163688</pub-id>
<pub-id pub-id-type="pmid">27685447</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref013">
<label>13</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sharma</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Bhattacharya</surname>
<given-names>J</given-names>
</name>
.
<article-title>Cellular & molecular basis of HIV-associated neuropathogenesis</article-title>
.
<source>Indian J Med Res</source>
.
<year>2009</year>
;
<volume>129</volume>
:
<fpage>637</fpage>
<pub-id pub-id-type="pmid">19692743</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref014">
<label>14</label>
<mixed-citation publication-type="other">Dinur I, Steurer D. Analytical approach to parallel repetition. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing; 2014; 624-633.</mixed-citation>
</ref>
<ref id="pone.0211714.ref015">
<label>15</label>
<mixed-citation publication-type="journal">
<name>
<surname>Alon</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Moshkovitz</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Safra</surname>
<given-names>S</given-names>
</name>
.
<article-title>Algorithmic construction of sets for k-restrictions</article-title>
.
<source>ACM Transactions on Algorithms (TALG)</source>
.
<year>2006</year>
;
<volume>2</volume>
:
<fpage>153</fpage>
<lpage>177</lpage>
.
<pub-id pub-id-type="doi">10.1145/1150334.1150336</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref016">
<label>16</label>
<mixed-citation publication-type="book">
<name>
<surname>Schrijver</surname>
<given-names>A</given-names>
</name>
.
<source>Theory of Linear and Integer Programming</source>
.
<publisher-loc>Amsterdam</publisher-loc>
:
<publisher-name>John Wiley & Sons</publisher-name>
;
<year>1998</year>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref017">
<label>17</label>
<mixed-citation publication-type="journal">
<name>
<surname>Pataki</surname>
<given-names>G</given-names>
</name>
.
<article-title>Teaching integer programming formulations using the traveling salesman problem</article-title>
.
<source>SIAM Rev</source>
.
<year>2003</year>
;
<volume>45</volume>
:
<fpage>116</fpage>
<lpage>123</lpage>
.
<pub-id pub-id-type="doi">10.1137/S00361445023685</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref018">
<label>18</label>
<mixed-citation publication-type="journal">
<name>
<surname>Miller</surname>
<given-names>CE</given-names>
</name>
,
<name>
<surname>Tucker</surname>
<given-names>AW</given-names>
</name>
,
<name>
<surname>Zemlin</surname>
<given-names>RA</given-names>
</name>
.
<article-title>Integer programming formulation of traveling salesman problems</article-title>
.
<source>J ACM</source>
.
<year>1960</year>
;
<volume>7</volume>
:
<fpage>326</fpage>
<lpage>329</lpage>
.
<pub-id pub-id-type="doi">10.1145/321043.321046</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref019">
<label>19</label>
<mixed-citation publication-type="book">
<name>
<surname>Haupt</surname>
<given-names>RL</given-names>
</name>
,
<name>
<surname>Haupt</surname>
<given-names>SE</given-names>
</name>
.
<source>Practical genetic algorithms</source>
.
<publisher-loc>New Jersey</publisher-loc>
:
<publisher-name>John Wiley & Sons</publisher-name>
;
<year>2004</year>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref020">
<label>20</label>
<mixed-citation publication-type="book">
<name>
<surname>Gusfield</surname>
<given-names>D</given-names>
</name>
.
<source>Algorithms on strings, trees and sequences: computer science and computational biology</source>
.
<publisher-loc>Cambridge</publisher-loc>
:
<publisher-name>Cambridge university press</publisher-name>
;
<year>1997</year>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref021">
<label>21</label>
<mixed-citation publication-type="other">GenBank website. [cited 06 August 2018]. Available from:
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/genbank/">www.ncbi.nlm.nih.gov/genbank/</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref022">
<label>22</label>
<mixed-citation publication-type="journal">
<name>
<surname>Nickle</surname>
<given-names>DC</given-names>
</name>
,
<name>
<surname>Rolland</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Jensen</surname>
<given-names>MA</given-names>
</name>
,
<name>
<surname>Pond</surname>
<given-names>SLK</given-names>
</name>
,
<name>
<surname>Deng</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Seligman</surname>
<given-names>M</given-names>
</name>
,
<etal>et al</etal>
<article-title>Coping with viral diversity in HIV vaccine design</article-title>
.
<source>PLoS Comput Biol</source>
.
<year>2007</year>
;
<volume>3</volume>
:
<fpage>e75</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.0030075</pub-id>
<pub-id pub-id-type="pmid">17465674</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref023">
<label>23</label>
<mixed-citation publication-type="journal">
<name>
<surname>Vita</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Zarebski</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Greenbaum</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Emami</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Hoof</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Salimi</surname>
<given-names>N</given-names>
</name>
,
<etal>et al</etal>
<article-title>The immune epitope database 2.0</article-title>
.
<source>Nucleic Acids Res</source>
.
<year>2009</year>
;
<volume>38</volume>
:
<fpage>854</fpage>
<lpage>862</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gkp1004</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref024">
<label>24</label>
<mixed-citation publication-type="other">HIV Molecular Immunology Database website. [cited 06 August 2018]. Available from:
<ext-link ext-link-type="uri" xlink:href="http://www.hiv.lanl.gov/content/immunology">www.hiv.lanl.gov/content/immunology</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref025">
<label>25</label>
<mixed-citation publication-type="journal">
<name>
<surname>Rasmussen</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Fenoy</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Harndahl</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Kristensen</surname>
<given-names>AB</given-names>
</name>
,
<name>
<surname>Nielsen</surname>
<given-names>IK</given-names>
</name>
,
<name>
<surname>Nielsen</surname>
<given-names>M</given-names>
</name>
,
<etal>et al</etal>
<article-title>Pan-Specific Prediction of Peptide–MHC Class I Complex Stability, a Correlate of T Cell Immunogenicity</article-title>
.
<source>The Journal of Immunology</source>
.
<year>2016</year>
;
<volume>1600582</volume>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref026">
<label>26</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sette</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Vitiello</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Reherman</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Fowler</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Nayersina</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Kast</surname>
<given-names>WM</given-names>
</name>
,
<etal>et al</etal>
<article-title>The relationship between class I binding affinity and immunogenicity of potential cytotoxic T cell epitopes</article-title>
.
<source>The Journal of Immunology</source>
.
<year>1994</year>
;
<volume>153</volume>
:
<fpage>5586</fpage>
<lpage>5592</lpage>
.
<pub-id pub-id-type="pmid">7527444</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref027">
<label>27</label>
<mixed-citation publication-type="journal">
<name>
<surname>Paul</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Weiskopf</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Angelo</surname>
<given-names>MA</given-names>
</name>
,
<name>
<surname>Sidney</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Peters</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Sette</surname>
<given-names>A</given-names>
</name>
.
<article-title>HLA class I alleles are associated with peptide-binding repertoires of different size, affinity, and immunogenicity</article-title>
.
<source>The Journal of Immunology</source>
.
<year>2013</year>
;
<volume>1302101</volume>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref028">
<label>28</label>
<mixed-citation publication-type="other">Java website. [cited 06 August 2018]. Available from:
<ext-link ext-link-type="uri" xlink:href="http://www.java.com">www.java.com</ext-link>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref029">
<label>29</label>
<mixed-citation publication-type="other">IBM ILOG CPLEX Optimization Studio website. [cited 06 August 2018]. Available from:
<ext-link ext-link-type="uri" xlink:href="http://www-03.ibm.com/software/products/us/en/ibmilogcpleoptistud/">www-03.ibm.com/software/products/us/en/ibmilogcpleoptistud/</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref030">
<label>30</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bergmann-Leitner</surname>
<given-names>ES</given-names>
</name>
,
<name>
<surname>Chaudhury</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Steers</surname>
<given-names>NJ</given-names>
</name>
,
<name>
<surname>Sabato</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Delvecchio</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Wallqvist</surname>
<given-names>AS</given-names>
</name>
,
<etal>et al</etal>
<article-title>Computational and experimental validation of B and T-cell epitopes of the in vivo immune response to a novel malarial antigen</article-title>
.
<source>PloS One</source>
.
<year>2013</year>
;
<volume>8</volume>
:
<fpage>e71610</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0071610</pub-id>
<pub-id pub-id-type="pmid">23977087</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref031">
<label>31</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bryson</surname>
<given-names>CJ</given-names>
</name>
,
<name>
<surname>Jones</surname>
<given-names>TD</given-names>
</name>
,
<name>
<surname>Baker</surname>
<given-names>MP</given-names>
</name>
.
<article-title>Prediction of immunogenicity of therapeutic proteins</article-title>
.
<source>BioDrugs</source>
.
<year>2010</year>
;
<volume>24</volume>
:
<fpage>1</fpage>
<lpage>8</lpage>
.
<pub-id pub-id-type="doi">10.2165/11318560-000000000-00000</pub-id>
<pub-id pub-id-type="pmid">20055528</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref032">
<label>32</label>
<mixed-citation publication-type="journal">
<name>
<surname>Khan</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Kumar</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Ranganathan</surname>
<given-names>S</given-names>
</name>
.
<article-title>In silico prediction of immunogenic T cell epitopes for HLA-DQ8</article-title>
.
<source>Immunome Research</source>
.
<year>2012</year>
;
<volume>8</volume>
:
<fpage>1</fpage>
<lpage>9</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref033">
<label>33</label>
<mixed-citation publication-type="journal">
<name>
<surname>Moreau</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Fleury</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Piquer</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Nguyen</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Novali</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Villard</surname>
<given-names>S</given-names>
</name>
,
<etal>et al</etal>
<article-title>PEPOP: computational design of immunogenic peptides</article-title>
.
<source>BMC Bioinformatics</source>
.
<year>2008</year>
;
<volume>9</volume>
:
<fpage>71</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-9-71</pub-id>
<pub-id pub-id-type="pmid">18234071</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref034">
<label>34</label>
<mixed-citation publication-type="journal">
<name>
<surname>Calis</surname>
<given-names>JJ</given-names>
</name>
,
<name>
<surname>Maybeno</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Greenbaum</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Weiskopf</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>De Silva</surname>
<given-names>AD</given-names>
</name>
,
<name>
<surname>Sette</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
<article-title>Properties of MHC class I presented peptides that enhance immunogenicity</article-title>
.
<source>PLoS Comput Biol</source>
.
<year>2013</year>
;
<volume>9</volume>
:
<fpage>e1003266</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1003266</pub-id>
<pub-id pub-id-type="pmid">24204222</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref035">
<label>35</label>
<mixed-citation publication-type="journal">
<name>
<surname>Paul</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Kolla</surname>
<given-names>RV</given-names>
</name>
,
<name>
<surname>Sidney</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Weiskopf</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Fleri</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Kim</surname>
<given-names>Y</given-names>
</name>
,
<etal>et al</etal>
<article-title>Evaluating the immunogenicity of protein drugs by applying in vitro MHC binding data and the immune epitope database and analysis resource</article-title>
.
<source>Clinical and Developmental Immunology</source>
.
<year>2013</year>
;
<volume>2013</volume>
<pub-id pub-id-type="doi">10.1155/2013/467852</pub-id>
<pub-id pub-id-type="pmid">24222776</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref036">
<label>36</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ponomarenko</surname>
<given-names>JV</given-names>
</name>
,
<name>
<surname>Van Regenmortel</surname>
<given-names>MH</given-names>
</name>
.
<article-title>B cell epitope prediction</article-title>
.
<source>Structural bioinformatics</source>
.
<year>2009</year>
;
<fpage>849</fpage>
<lpage>879</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref037">
<label>37</label>
<mixed-citation publication-type="journal">
<name>
<surname>Shmelkov</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Krachmarov</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Grigoryan</surname>
<given-names>AV</given-names>
</name>
,
<name>
<surname>Pinter</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Statnikov</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Cardozo</surname>
<given-names>T</given-names>
</name>
.
<article-title>Computational prediction of neutralization epitopes targeted by human anti-V3 HIV monoclonal antibodies</article-title>
.
<source>PloS One</source>
.
<year>2014</year>
;
<volume>9</volume>
:
<fpage>e89987</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0089987</pub-id>
<pub-id pub-id-type="pmid">24587168</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref038">
<label>38</label>
<mixed-citation publication-type="journal">
<name>
<surname>Tong</surname>
<given-names>JC</given-names>
</name>
,
<name>
<surname>Tan</surname>
<given-names>TW</given-names>
</name>
,
<name>
<surname>Ranganathan</surname>
<given-names>S</given-names>
</name>
.
<article-title>Methods and protocols for prediction of immunogenic epitopes</article-title>
.
<source>Brief Bioinform</source>
.
<year>2006</year>
;
<volume>8</volume>
:
<fpage>96</fpage>
<lpage>108</lpage>
.
<pub-id pub-id-type="doi">10.1093/bib/bbl038</pub-id>
<pub-id pub-id-type="pmid">17077136</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref039">
<label>39</label>
<mixed-citation publication-type="journal">
<name>
<surname>Tung</surname>
<given-names>CW</given-names>
</name>
,
<name>
<surname>Ho</surname>
<given-names>SY</given-names>
</name>
.
<article-title>POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties</article-title>
.
<source>Bioinformatics</source>
.
<year>2007</year>
;
<volume>23</volume>
:
<fpage>942</fpage>
<lpage>949</lpage>
.
<pub-id pub-id-type="doi">10.1093/bioinformatics/btm061</pub-id>
<pub-id pub-id-type="pmid">17384427</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref040">
<label>40</label>
<mixed-citation publication-type="other">IEDB, T cell class I pMHC immunogenicity predictor website. [cited 06 August 2018]. Available from:
<ext-link ext-link-type="uri" xlink:href="http://tools.immuneepitope.org/immunogenicity/">http://tools.immuneepitope.org/immunogenicity/</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref041">
<label>41</label>
<mixed-citation publication-type="other">Mathematica website. [cited 06 August 2018]. Available from:
<ext-link ext-link-type="uri" xlink:href="https://www.wolfram.com/mathematica/">https://www.wolfram.com/mathematica/</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref042">
<label>42</label>
<mixed-citation publication-type="journal">
<name>
<surname>O’Neill</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Kuo</surname>
<given-names>LS</given-names>
</name>
,
<name>
<surname>Krisko</surname>
<given-names>JF</given-names>
</name>
,
<name>
<surname>Tomchick</surname>
<given-names>DR</given-names>
</name>
,
<name>
<surname>Garcia</surname>
<given-names>JV</given-names>
</name>
,
<name>
<surname>Foster</surname>
<given-names>JL</given-names>
</name>
.
<article-title>Dynamic evolution of the human immunodeficiency virus type 1 pathogenic factor, Nef</article-title>
.
<source>J Virol</source>
.
<year>2006</year>
;
<volume>80</volume>
:
<fpage>1311</fpage>
<lpage>1320</lpage>
.
<pub-id pub-id-type="doi">10.1128/JVI.80.3.1311-1320.2006</pub-id>
<pub-id pub-id-type="pmid">16415008</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref043">
<label>43</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
.
<article-title>I-TASSER server for protein 3D structure prediction</article-title>
.
<source>BMC Bioinformatics</source>
.
<year>2008</year>
;
<volume>9</volume>
:
<fpage>40</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-9-40</pub-id>
<pub-id pub-id-type="pmid">18215316</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref044">
<label>44</label>
<mixed-citation publication-type="journal">
<name>
<surname>Roy</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Kucukural</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
.
<article-title>I-TASSER: a unified platform for automated protein structure and function prediction</article-title>
.
<source>Nat Protoc</source>
.
<year>2010</year>
;
<volume>5</volume>
:
<fpage>725</fpage>
<pub-id pub-id-type="doi">10.1038/nprot.2010.5</pub-id>
<pub-id pub-id-type="pmid">20360767</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref045">
<label>45</label>
<mixed-citation publication-type="journal">
<name>
<surname>Yang</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Yan</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Roy</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Xu</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Poisson</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
.
<article-title>The I-TASSER Suite: protein structure and function prediction</article-title>
.
<source>Nat Methods</source>
.
<year>2015</year>
;
<volume>12</volume>
:
<fpage>7</fpage>
<pub-id pub-id-type="doi">10.1038/nmeth.3213</pub-id>
<pub-id pub-id-type="pmid">25549265</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref046">
<label>46</label>
<mixed-citation publication-type="journal">
<name>
<surname>Singh</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Yadav</surname>
<given-names>GP</given-names>
</name>
,
<name>
<surname>Gupta</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Tripathi</surname>
<given-names>AK</given-names>
</name>
,
<name>
<surname>Ramachandran</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Tripathi</surname>
<given-names>RK</given-names>
</name>
.
<article-title>A novel dimer-tetramer transition captured by the crystal structure of the HIV-1 Nef</article-title>
.
<source>PloS One</source>
.
<year>2011</year>
;
<volume>6</volume>
:
<fpage>e26629</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0026629</pub-id>
<pub-id pub-id-type="pmid">22073177</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref047">
<label>47</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Skolnick</surname>
<given-names>J</given-names>
</name>
.
<article-title>TM-align: a protein structure alignment algorithm based on the TM-score</article-title>
.
<source>Nucleic Acids Res</source>
.
<year>2005</year>
;
<volume>33</volume>
:
<fpage>2302</fpage>
<lpage>2309</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gki524</pub-id>
<pub-id pub-id-type="pmid">15849316</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref048">
<label>48</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kelley</surname>
<given-names>LA</given-names>
</name>
,
<name>
<surname>Mezulis</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Yates</surname>
<given-names>CM</given-names>
</name>
,
<name>
<surname>Wass</surname>
<given-names>MN</given-names>
</name>
,
<name>
<surname>Sternberg</surname>
<given-names>MJ</given-names>
</name>
.
<article-title>The Phyre2 web portal for protein modeling, prediction and analysis</article-title>
.
<source>Nat Protoc</source>
.
<year>2015</year>
;
<volume>10</volume>
:
<fpage>845</fpage>
<pub-id pub-id-type="doi">10.1038/nprot.2015.053</pub-id>
<pub-id pub-id-type="pmid">25950237</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref049">
<label>49</label>
<mixed-citation publication-type="other">BLAST website. [cited 06 August 2018]. Available from:
<ext-link ext-link-type="uri" xlink:href="https://blast.ncbi.nlm.nih.gov/Blast.cgi">https://blast.ncbi.nlm.nih.gov/Blast.cgi</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref050">
<label>50</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kavanagh</surname>
<given-names>DG</given-names>
</name>
,
<name>
<surname>Kaufmann</surname>
<given-names>DE</given-names>
</name>
,
<name>
<surname>Sunderji</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Frahm</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Le Gall</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Boczkowski</surname>
<given-names>D</given-names>
</name>
,
<etal>et al</etal>
<article-title>Expansion of HIV-specific CD4+ and CD8+ T cells by dendritic cells transfected with mRNA encoding cytoplasm-or lysosome-targeted Nef</article-title>
.
<source>Blood</source>
.
<year>2006</year>
;
<volume>107</volume>
:
<fpage>1963</fpage>
<lpage>1969</lpage>
.
<pub-id pub-id-type="doi">10.1182/blood-2005-04-1513</pub-id>
<pub-id pub-id-type="pmid">16249391</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref051">
<label>51</label>
<mixed-citation publication-type="journal">
<name>
<surname>van Baalen</surname>
<given-names>CA</given-names>
</name>
,
<name>
<surname>Kwa</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Verschuren</surname>
<given-names>EJ</given-names>
</name>
,
<name>
<surname>Reedijk</surname>
<given-names>ML</given-names>
</name>
,
<name>
<surname>Boon</surname>
<given-names>AC</given-names>
</name>
,
<name>
<surname>de Mutsert</surname>
<given-names>G</given-names>
</name>
,
<etal>et al</etal>
<article-title>Fluorescent Antigen–Transfected Target Cell Cytotoxic T Lymphocyte Assay for Ex Vivo Detection of Antigen-Specific Cell-Mediated Cytotoxicity</article-title>
.
<source>The Journal of infectious diseases</source>
.
<year>2005</year>
;
<volume>192</volume>
:
<fpage>1183</fpage>
<lpage>1190</lpage>
.
<pub-id pub-id-type="doi">10.1086/444546</pub-id>
<pub-id pub-id-type="pmid">16136460</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref052">
<label>52</label>
<mixed-citation publication-type="journal">
<name>
<surname>Michael</surname>
<given-names>NL</given-names>
</name>
,
<name>
<surname>Chang</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>d’Arcy</surname>
<given-names>LA</given-names>
</name>
,
<name>
<surname>Tseng</surname>
<given-names>CJ</given-names>
</name>
,
<name>
<surname>Birx</surname>
<given-names>DL</given-names>
</name>
,
<name>
<surname>Sheppard</surname>
<given-names>HW</given-names>
</name>
.
<article-title>Functional characterization of human immunodeficiency virus type 1 nef genes in patients with divergent rates of disease progression</article-title>
.
<source>J Virol</source>
.
<year>1995</year>
;
<volume>69</volume>
:
<fpage>6758</fpage>
<lpage>6769</lpage>
.
<pub-id pub-id-type="pmid">7474087</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref053">
<label>53</label>
<mixed-citation publication-type="journal">
<name>
<surname>Huang</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Ho</surname>
<given-names>DD</given-names>
</name>
.
<article-title>Characterization of nef sequences in long-term survivors of human immunodeficiency virus type 1 infection</article-title>
.
<source>J Virol</source>
.
<year>1995</year>
;
<volume>69</volume>
:
<fpage>93</fpage>
<lpage>100</lpage>
.
<pub-id pub-id-type="pmid">7983771</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref054">
<label>54</label>
<mixed-citation publication-type="other">VaxiJen website. [cited 06 August 2018]. Available from:
<ext-link ext-link-type="uri" xlink:href="http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html">www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref055">
<label>55</label>
<mixed-citation publication-type="journal">
<name>
<surname>Doytchinova</surname>
<given-names>IA</given-names>
</name>
,
<name>
<surname>Flower</surname>
<given-names>DR</given-names>
</name>
.
<article-title>VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines</article-title>
.
<source>BMC Bioinformatics</source>
.
<year>2007</year>
;
<volume>8</volume>
:
<fpage>4</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-4</pub-id>
<pub-id pub-id-type="pmid">17207271</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref056">
<label>56</label>
<mixed-citation publication-type="other">LANL’s Epigraph website [cited 02 November 2018]. Available from:
<ext-link ext-link-type="uri" xlink:href="https://www.hiv.lanl.gov/content/sequence/EPIGRAPH/epigraph.html">https://www.hiv.lanl.gov/content/sequence/EPIGRAPH/epigraph.html</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref057">
<label>57</label>
<mixed-citation publication-type="other">LANL’s Consensus website [cited 02 November 2018]. Available from:
<ext-link ext-link-type="uri" xlink:href="https://www.hiv.lanl.gov/content/sequence/CONSENSUS/SimpCon.html">https://www.hiv.lanl.gov/content/sequence/CONSENSUS/SimpCon.html</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref058">
<label>58</label>
<mixed-citation publication-type="other">Diaz R, Suarez AR. A study of the capacity of the stochastic Hill Climbing to solve multi-objective problems. In Proceedings of the Third International Symposium on Adaptive Systems-Evolutionary Computation and Probabilistic Graphical Models, La Habana: Institute of Cybernetics, Mathematics and Physics. 2001; 37-40.</mixed-citation>
</ref>
<ref id="pone.0211714.ref059">
<label>59</label>
<mixed-citation publication-type="journal">
<name>
<surname>Calderon-Gonzalez</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Tobes</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Pareja</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Frande-Cabanes</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Petrovsky</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Alvarez-Dominguez</surname>
<given-names>C</given-names>
</name>
.
<article-title>Identification and characterisation of T-cell epitopes for incorporation into dendritic cell-delivered Listeria vaccines</article-title>
.
<source>J Immunol Methods</source>
.
<year>2015</year>
;
<volume>424</volume>
:
<fpage>111</fpage>
<lpage>119</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.jim.2015.05.009</pub-id>
<pub-id pub-id-type="pmid">26031451</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref060">
<label>60</label>
<mixed-citation publication-type="journal">
<name>
<surname>Alvarez-Dominguez</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Madrazo-Toca</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Fernandez-Prieto</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Vandekerckhove</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Pareja</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Tobes</surname>
<given-names>R</given-names>
</name>
,
<etal>et al</etal>
<article-title>Characterization of a Listeria monocytogenes protein interfering with Rab5a</article-title>
.
<source>Traffic</source>
.
<year>2008</year>
;
<volume>9</volume>
:
<fpage>325</fpage>
<lpage>337</lpage>
.
<pub-id pub-id-type="doi">10.1111/j.1600-0854.2007.00683.x</pub-id>
<pub-id pub-id-type="pmid">18088303</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref061">
<label>61</label>
<mixed-citation publication-type="journal">
<name>
<surname>Calderón-González</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Frande-Cabanes</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Bronchalo-Vicente</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Lecea-Cuello</surname>
<given-names>MJ</given-names>
</name>
,
<name>
<surname>Pareja</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Bosch-Martínez</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
<article-title>Cellular vaccines in listeriosis: role of the Listeria antigen GAPDH</article-title>
.
<source>Front Cell Infect Mi</source>
.
<year>2014</year>
;
<volume>4</volume>
:
<fpage>22</fpage>
<lpage>33</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0211714.ref062">
<label>62</label>
<mixed-citation publication-type="journal">
<name>
<surname>Welsh</surname>
<given-names>RM</given-names>
</name>
,
<name>
<surname>Selin</surname>
<given-names>LK</given-names>
</name>
.
<article-title>No one is naive: the significance of heterologous T-cell immunity</article-title>
.
<source>Nature Reviews Immunology</source>
.
<year>2002</year>
;
<volume>2</volume>
:
<fpage>417</fpage>
<pub-id pub-id-type="doi">10.1038/nri820</pub-id>
<pub-id pub-id-type="pmid">12093008</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref063">
<label>63</label>
<mixed-citation publication-type="journal">
<name>
<surname>Rehermann</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Shin</surname>
<given-names>EC</given-names>
</name>
.
<article-title>Private aspects of heterologous immunity</article-title>
.
<source>Journal of Experimental Medicine</source>
.
<year>2005</year>
;
<volume>201</volume>
:
<fpage>667</fpage>
<lpage>670</lpage>
.
<pub-id pub-id-type="doi">10.1084/jem.20050220</pub-id>
<pub-id pub-id-type="pmid">15753200</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref064">
<label>64</label>
<mixed-citation publication-type="journal">
<name>
<surname>de Boer</surname>
<given-names>RJ</given-names>
</name>
,
<name>
<surname>Perelson</surname>
<given-names>AS</given-names>
</name>
.
<article-title>How germinal centers evolve broadly neutralizing antibodies: the breadth of follicular helper T cell response</article-title>
.
<source>Journal of Virology</source>
.
<year>2017</year>
; JVI-00983.
<pub-id pub-id-type="doi">10.1128/JVI.00983-17</pub-id>
<pub-id pub-id-type="pmid">28878083</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0211714.ref065">
<label>65</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>S</given-names>
</name>
.
<article-title>Optimal sequential immunization can focus antibody responses against diversity loss and distraction</article-title>
.
<source>PLoS Computational Biology</source>
.
<year>2017</year>
;
<volume>13</volume>
:
<fpage>e1005336</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1005336</pub-id>
<pub-id pub-id-type="pmid">28135270</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001047 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 001047 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:6368308
   |texte=   Weighted lambda superstrings applied to vaccine design
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:30735507" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021