Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

String kernels for protein sequence comparisons: improved fold recognition

Identifieur interne : 000264 ( Pmc/Curation ); précédent : 000263; suivant : 000265

String kernels for protein sequence comparisons: improved fold recognition

Auteurs : Saghi Nojoomi [États-Unis] ; Patrice Koehl [États-Unis]

Source :

RBID : PMC:5331664

Abstract

Background

The amino acid sequence of a protein is the blueprint from which its structure and ultimately function can be derived. Therefore, sequence comparison methods remain essential for the determination of similarity between proteins. Traditional approaches for comparing two protein sequences begin with strings of letters (amino acids) that represent the sequences, before generating textual alignments between these strings and providing scores for each alignment. When the similitude between the two protein sequences to be compared is low however, the quality of the corresponding sequence alignment is usually poor, leading to poor performance for the recognition of similarity.

Results

In this study, we develop an alignment free alternative to these methods that is based on the concept of string kernels. Starting from recently proposed kernels on the discrete space of protein sequences (Shen et al, Found. Comput. Math., 2013,14:951-984), we introduce our own version, SeqKernel. Its implementation depends on two parameters, a coefficient that tunes the substitution matrix and the maximum length of k-mers that it includes. We provide an exhaustive analysis of the impacts of these two parameters on the performance of SeqKernel for fold recognition. We show that with the right choice of parameters, use of the SeqKernel similarity measure improves fold recognition compared to the use of traditional alignment-based methods. We illustrate the application of SeqKernel to inferring phylogeny on RNA polymerases and show that it performs as well as methods based on multiple sequence alignments.

Conclusion

We have presented and characterized a new alignment free method based on a mathematical kernel for scoring the similarity of protein sequences. We discuss possible improvements of this method, as well as an extension of its applications to other modeling methods that rely on sequence comparison.


Url:
DOI: 10.1186/s12859-017-1560-9
PubMed: 28245816
PubMed Central: 5331664

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:5331664

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">String kernels for protein sequence comparisons: improved fold recognition</title>
<author>
<name sortKey="Nojoomi, Saghi" sort="Nojoomi, Saghi" uniqKey="Nojoomi S" first="Saghi" last="Nojoomi">Saghi Nojoomi</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">Biotechnology program, University of California, Davis, 1, Shields Avenue, Davis, CA, 95616 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Biotechnology program, University of California, Davis, 1, Shields Avenue, Davis, CA</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Koehl, Patrice" sort="Koehl, Patrice" uniqKey="Koehl P" first="Patrice" last="Koehl">Patrice Koehl</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">Department of Computer Science and Genome Center, 1, Shields Avenue, Davis, CA, 95616 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Genome Center, 1, Shields Avenue, Davis, CA</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28245816</idno>
<idno type="pmc">5331664</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5331664</idno>
<idno type="RBID">PMC:5331664</idno>
<idno type="doi">10.1186/s12859-017-1560-9</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000264</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000264</idno>
<idno type="wicri:Area/Pmc/Curation">000264</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000264</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">String kernels for protein sequence comparisons: improved fold recognition</title>
<author>
<name sortKey="Nojoomi, Saghi" sort="Nojoomi, Saghi" uniqKey="Nojoomi S" first="Saghi" last="Nojoomi">Saghi Nojoomi</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">Biotechnology program, University of California, Davis, 1, Shields Avenue, Davis, CA, 95616 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Biotechnology program, University of California, Davis, 1, Shields Avenue, Davis, CA</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Koehl, Patrice" sort="Koehl, Patrice" uniqKey="Koehl P" first="Patrice" last="Koehl">Patrice Koehl</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">Department of Computer Science and Genome Center, 1, Shields Avenue, Davis, CA, 95616 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Genome Center, 1, Shields Avenue, Davis, CA</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>The amino acid sequence of a protein is the blueprint from which its structure and ultimately function can be derived. Therefore, sequence comparison methods remain essential for the determination of similarity between proteins. Traditional approaches for comparing two protein sequences begin with strings of letters (amino acids) that represent the sequences, before generating textual alignments between these strings and providing scores for each alignment. When the similitude between the two protein sequences to be compared is low however, the quality of the corresponding sequence alignment is usually poor, leading to poor performance for the recognition of similarity.</p>
</sec>
<sec>
<title>Results</title>
<p>In this study, we develop an alignment free alternative to these methods that is based on the concept of string kernels. Starting from recently proposed kernels on the discrete space of protein sequences (Shen et al,
<italic>Found. Comput. Math.</italic>
, 2013,14:951-984), we introduce our own version, SeqKernel. Its implementation depends on two parameters, a coefficient that tunes the substitution matrix and the maximum length of
<italic>k-mers</italic>
that it includes. We provide an exhaustive analysis of the impacts of these two parameters on the performance of SeqKernel for fold recognition. We show that with the right choice of parameters, use of the SeqKernel similarity measure improves fold recognition compared to the use of traditional alignment-based methods. We illustrate the application of SeqKernel to inferring phylogeny on RNA polymerases and show that it performs as well as methods based on multiple sequence alignments.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We have presented and characterized a new alignment free method based on a mathematical kernel for scoring the similarity of protein sequences. We discuss possible improvements of this method, as well as an extension of its applications to other modeling methods that rely on sequence comparison.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Oldfield, Cj" uniqKey="Oldfield C">CJ Oldfield</name>
</author>
<author>
<name sortKey="Dunker, Ak" uniqKey="Dunker A">AK Dunker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dill, Ka" uniqKey="Dill K">KA Dill</name>
</author>
<author>
<name sortKey="Ozkan, Sb" uniqKey="Ozkan S">SB Ozkan</name>
</author>
<author>
<name sortKey="Weikl, Tr" uniqKey="Weikl T">TR Weikl</name>
</author>
<author>
<name sortKey="Chodera, Jd" uniqKey="Chodera J">JD Chodera</name>
</author>
<author>
<name sortKey="Voelz, Va" uniqKey="Voelz V">VA Voelz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Das, R" uniqKey="Das R">R Das</name>
</author>
<author>
<name sortKey="Baker, D" uniqKey="Baker D">D Baker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bairoch, A" uniqKey="Bairoch A">A Bairoch</name>
</author>
<author>
<name sortKey="Apweiler, R" uniqKey="Apweiler R">R Apweiler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berman, Hm" uniqKey="Berman H">HM Berman</name>
</author>
<author>
<name sortKey="Westbrook, J" uniqKey="Westbrook J">J Westbrook</name>
</author>
<author>
<name sortKey="Feng, Z" uniqKey="Feng Z">Z Feng</name>
</author>
<author>
<name sortKey="Gilliland, G" uniqKey="Gilliland G">G Gilliland</name>
</author>
<author>
<name sortKey="Bhat, Tn" uniqKey="Bhat T">TN Bhat</name>
</author>
<author>
<name sortKey="Weissig, H" uniqKey="Weissig H">H Weissig</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
<author>
<name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
<author>
<name sortKey="Krogh, A" uniqKey="Krogh A">A Krogh</name>
</author>
<author>
<name sortKey="Mitchison, G" uniqKey="Mitchison G">G Mitchison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gusfield, D" uniqKey="Gusfield D">D Gusfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwartz, Rm" uniqKey="Schwartz R">RM Schwartz</name>
</author>
<author>
<name sortKey="Dayhoff, Mo" uniqKey="Dayhoff M">MO Dayhoff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Henikoff, S" uniqKey="Henikoff S">S Henikoff</name>
</author>
<author>
<name sortKey="Henikoff, Jg" uniqKey="Henikoff J">JG Henikoff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ladunga, I" uniqKey="Ladunga I">I Ladunga</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, Tf" uniqKey="Smith T">TF Smith</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Needleman, Sb" uniqKey="Needleman S">SB Needleman</name>
</author>
<author>
<name sortKey="Wunsch, Cd" uniqKey="Wunsch C">CD Wunsch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Storey, Jd" uniqKey="Storey J">JD Storey</name>
</author>
<author>
<name sortKey="Sigmund, D" uniqKey="Sigmund D">D Sigmund</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rost, B" uniqKey="Rost B">B Rost</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wallace, Im" uniqKey="Wallace I">IM Wallace</name>
</author>
<author>
<name sortKey="Blackshields, G" uniqKey="Blackshields G">G Blackshields</name>
</author>
<author>
<name sortKey="Higgins, Dg" uniqKey="Higgins D">DG Higgins</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vinga, S" uniqKey="Vinga S">S Vinga</name>
</author>
<author>
<name sortKey="Almeida, J" uniqKey="Almeida J">J Almeida</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bonham Carter, O" uniqKey="Bonham Carter O">O Bonham-Carter</name>
</author>
<author>
<name sortKey="Steele, J" uniqKey="Steele J">J Steele</name>
</author>
<author>
<name sortKey="Bastola, D" uniqKey="Bastola D">D Bastola</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vinga, S" uniqKey="Vinga S">S Vinga</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwende, I" uniqKey="Schwende I">I Schwende</name>
</author>
<author>
<name sortKey="Pham, Td" uniqKey="Pham T">TD Pham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ulitsky, I" uniqKey="Ulitsky I">I Ulitsky</name>
</author>
<author>
<name sortKey="Burstein, D" uniqKey="Burstein D">D Burstein</name>
</author>
<author>
<name sortKey="Tuller, T" uniqKey="Tuller T">T Tuller</name>
</author>
<author>
<name sortKey="Chor, B" uniqKey="Chor B">B Chor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Didier, G" uniqKey="Didier G">G Didier</name>
</author>
<author>
<name sortKey="Corel, E" uniqKey="Corel E">E Corel</name>
</author>
<author>
<name sortKey="Laprevotte, I" uniqKey="Laprevotte I">I Laprevotte</name>
</author>
<author>
<name sortKey="Grossmann, A" uniqKey="Grossmann A">A Grossmann</name>
</author>
<author>
<name sortKey="Landes Devauchelle, C" uniqKey="Landes Devauchelle C">C Landes-Devauchelle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ma, B" uniqKey="Ma B">B Ma</name>
</author>
<author>
<name sortKey="Tromp, J" uniqKey="Tromp J">J Tromp</name>
</author>
<author>
<name sortKey="Li, M" uniqKey="Li M">M Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Burkhardt, S" uniqKey="Burkhardt S">S Burkhardt</name>
</author>
<author>
<name sortKey="K Rkk Inen, J" uniqKey="K Rkk Inen J">J Kärkkäinen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Keich, U" uniqKey="Keich U">U Keich</name>
</author>
<author>
<name sortKey="Li, M" uniqKey="Li M">M Li</name>
</author>
<author>
<name sortKey="Ma, B" uniqKey="Ma B">B Ma</name>
</author>
<author>
<name sortKey="Tromp, J" uniqKey="Tromp J">J Tromp</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leimeister, Ca" uniqKey="Leimeister C">CA Leimeister</name>
</author>
<author>
<name sortKey="Boden, M" uniqKey="Boden M">M Boden</name>
</author>
<author>
<name sortKey="Horwege, S" uniqKey="Horwege S">S Horwege</name>
</author>
<author>
<name sortKey="Lindner, S" uniqKey="Lindner S">S Lindner</name>
</author>
<author>
<name sortKey="Morgenstern, B" uniqKey="Morgenstern B">B Morgenstern</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lodhi, H" uniqKey="Lodhi H">H Lodhi</name>
</author>
<author>
<name sortKey="Saunders, C" uniqKey="Saunders C">C Saunders</name>
</author>
<author>
<name sortKey="Shawe Taylor, J" uniqKey="Shawe Taylor J">J Shawe-Taylor</name>
</author>
<author>
<name sortKey="Cristianini, N" uniqKey="Cristianini N">N Cristianini</name>
</author>
<author>
<name sortKey="Watkins, C" uniqKey="Watkins C">C Watkins</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Diekhans, Tjm" uniqKey="Diekhans T">TJM Diekhans</name>
</author>
<author>
<name sortKey="Haussler, D" uniqKey="Haussler D">D Haussler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liao, L" uniqKey="Liao L">L Liao</name>
</author>
<author>
<name sortKey="Noble, Ws" uniqKey="Noble W">WS Noble</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leslie, Cs" uniqKey="Leslie C">CS Leslie</name>
</author>
<author>
<name sortKey="Eskin, E" uniqKey="Eskin E">E Eskin</name>
</author>
<author>
<name sortKey="Cohen, A" uniqKey="Cohen A">A Cohen</name>
</author>
<author>
<name sortKey="Weston, J" uniqKey="Weston J">J Weston</name>
</author>
<author>
<name sortKey="Noble, Ws" uniqKey="Noble W">WS Noble</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="R Tsch, G" uniqKey="R Tsch G">G Rätsch</name>
</author>
<author>
<name sortKey="Sonnenburg, S" uniqKey="Sonnenburg S">S Sonnenburg</name>
</author>
<author>
<name sortKey="Scholkopf, B" uniqKey="Scholkopf B">B Schölkopf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ben Hur, A" uniqKey="Ben Hur A">A Ben-Hur</name>
</author>
<author>
<name sortKey="Ong, Cs" uniqKey="Ong C">CS Ong</name>
</author>
<author>
<name sortKey="Sonnenburg, S" uniqKey="Sonnenburg S">S Sonnenburg</name>
</author>
<author>
<name sortKey="Scholkopf, B" uniqKey="Scholkopf B">B Schölkopf</name>
</author>
<author>
<name sortKey="R Tsch, M" uniqKey="R Tsch M">M Rätsch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saigo, H" uniqKey="Saigo H">H Saigo</name>
</author>
<author>
<name sortKey="Vert, Jp" uniqKey="Vert J">JP Vert</name>
</author>
<author>
<name sortKey="Ueda, N" uniqKey="Ueda N">N Ueda</name>
</author>
<author>
<name sortKey="Akutsu, T" uniqKey="Akutsu T">T Akutsu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shen, Wj" uniqKey="Shen W">WJ Shen</name>
</author>
<author>
<name sortKey="Wong, Hs" uniqKey="Wong H">HS Wong</name>
</author>
<author>
<name sortKey="Xiao, Qw" uniqKey="Xiao Q">QW Xiao</name>
</author>
<author>
<name sortKey="Guo, X" uniqKey="Guo X">X Guo</name>
</author>
<author>
<name sortKey="Smale, S" uniqKey="Smale S">S Smale</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sillitoe, I" uniqKey="Sillitoe I">I Sillitoe</name>
</author>
<author>
<name sortKey="Lewis, Te" uniqKey="Lewis T">TE Lewis</name>
</author>
<author>
<name sortKey="Cuff, Al" uniqKey="Cuff A">AL Cuff</name>
</author>
<author>
<name sortKey="Das, S" uniqKey="Das S">S Das</name>
</author>
<author>
<name sortKey="Ashford, P" uniqKey="Ashford P">P Ashford</name>
</author>
<author>
<name sortKey="Dawson, Nl" uniqKey="Dawson N">NL Dawson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thorne, Jl" uniqKey="Thorne J">JL Thorne</name>
</author>
<author>
<name sortKey="Kishino, H" uniqKey="Kishino H">H Kishino</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Cx" uniqKey="Chan C">CX Chan</name>
</author>
<author>
<name sortKey="Bernard, G" uniqKey="Bernard G">G Bernard</name>
</author>
<author>
<name sortKey="Poirion, O" uniqKey="Poirion O">O Poirion</name>
</author>
<author>
<name sortKey="Hogan, Jm" uniqKey="Hogan J">JM Hogan</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Henikoff, S" uniqKey="Henikoff S">S Henikoff</name>
</author>
<author>
<name sortKey="Henikoff, J" uniqKey="Henikoff J">J Henikoff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pearson, W" uniqKey="Pearson W">W Pearson</name>
</author>
<author>
<name sortKey="Lipman, D" uniqKey="Lipman D">D Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le, Q" uniqKey="Le Q">Q Le</name>
</author>
<author>
<name sortKey="Pollastri, G" uniqKey="Pollastri G">G Pollastri</name>
</author>
<author>
<name sortKey="Koehl, P" uniqKey="Koehl P">P Koehl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Koehl, P" uniqKey="Koehl P">P Koehl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gribskov, M" uniqKey="Gribskov M">M Gribskov</name>
</author>
<author>
<name sortKey="Robinson, Nl" uniqKey="Robinson N">NL Robinson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Subbiah, S" uniqKey="Subbiah S">S Subbiah</name>
</author>
<author>
<name sortKey="Laurents, Dv" uniqKey="Laurents D">DV Laurents</name>
</author>
<author>
<name sortKey="Levitt, M" uniqKey="Levitt M">M Levitt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kolodny, R" uniqKey="Kolodny R">R Kolodny</name>
</author>
<author>
<name sortKey="Koehl, P" uniqKey="Koehl P">P Koehl</name>
</author>
<author>
<name sortKey="Levitt, M" uniqKey="Levitt M">M Levitt</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rieck, K" uniqKey="Rieck K">K Rieck</name>
</author>
<author>
<name sortKey="Wresnegger, C" uniqKey="Wresnegger C">C Wresnegger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chou, Kc" uniqKey="Chou K">KC Chou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xiao, X" uniqKey="Xiao X">X Xiao</name>
</author>
<author>
<name sortKey="Lin, Wz" uniqKey="Lin W">WZ Lin</name>
</author>
<author>
<name sortKey="Chou, Kc" uniqKey="Chou K">KC Chou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Felsenstein, J" uniqKey="Felsenstein J">J Felsenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chatzou, M" uniqKey="Chatzou M">M Chatzou</name>
</author>
<author>
<name sortKey="Magis, C" uniqKey="Magis C">C Magis</name>
</author>
<author>
<name sortKey="Chang, Jm" uniqKey="Chang J">JM Chang</name>
</author>
<author>
<name sortKey="Kemena, C" uniqKey="Kemena C">C Kemena</name>
</author>
<author>
<name sortKey="Bussotti, G" uniqKey="Bussotti G">G Bussotti</name>
</author>
<author>
<name sortKey="Erb, I" uniqKey="Erb I">I Erb</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hohl, M" uniqKey="Hohl M">M Höhl</name>
</author>
<author>
<name sortKey="Ragan, Ma" uniqKey="Ragan M">MA Ragan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wong, Km" uniqKey="Wong K">KM Wong</name>
</author>
<author>
<name sortKey="Suchard, Ma" uniqKey="Suchard M">MA Suchard</name>
</author>
<author>
<name sortKey="Huelsenbeck, Jp" uniqKey="Huelsenbeck J">JP Huelsenbeck</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, Mt" uniqKey="Wu M">MT Wu</name>
</author>
<author>
<name sortKey="Chatterji, S" uniqKey="Chatterji S">S Chatterji</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haubold, B" uniqKey="Haubold B">B Haubold</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sievers, F" uniqKey="Sievers F">F Sievers</name>
</author>
<author>
<name sortKey="Wilm, A" uniqKey="Wilm A">A Wilm</name>
</author>
<author>
<name sortKey="Dineen, D" uniqKey="Dineen D">D Dineen</name>
</author>
<author>
<name sortKey="Gibson, Tj" uniqKey="Gibson T">TJ Gibson</name>
</author>
<author>
<name sortKey="Karplus, K" uniqKey="Karplus K">K Karplus</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Felsenstein, J" uniqKey="Felsenstein J">J Felsenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, Dt" uniqKey="Jones D">DT Jones</name>
</author>
<author>
<name sortKey="Taylor, Wr" uniqKey="Taylor W">WR Taylor</name>
</author>
<author>
<name sortKey="Thornton, Jm" uniqKey="Thornton J">JM Thornton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fitch, Wm" uniqKey="Fitch W">WM Fitch</name>
</author>
<author>
<name sortKey="Margoliash, E" uniqKey="Margoliash E">E Margoliash</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kuhner, Mk" uniqKey="Kuhner M">MK Kuhner</name>
</author>
<author>
<name sortKey="Felsentein, J" uniqKey="Felsentein J">J Felsentein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kann, M" uniqKey="Kann M">M Kann</name>
</author>
<author>
<name sortKey="Qian, B" uniqKey="Qian B">B Qian</name>
</author>
<author>
<name sortKey="Goldstein, Ra" uniqKey="Goldstein R">RA Goldstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saigo, H" uniqKey="Saigo H">H Saigo</name>
</author>
<author>
<name sortKey="Vert, Jp" uniqKey="Vert J">JP Vert</name>
</author>
<author>
<name sortKey="Akutsu, T" uniqKey="Akutsu T">T Akutsu</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28245816</article-id>
<article-id pub-id-type="pmc">5331664</article-id>
<article-id pub-id-type="publisher-id">1560</article-id>
<article-id pub-id-type="doi">10.1186/s12859-017-1560-9</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methodology Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>String kernels for protein sequence comparisons: improved fold recognition</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Nojoomi</surname>
<given-names>Saghi</given-names>
</name>
<address>
<email>sjnojoomi@ucdavis.edu</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid">http://orcid.org/0000-0002-0908-068X</contrib-id>
<name>
<surname>Koehl</surname>
<given-names>Patrice</given-names>
</name>
<address>
<email>pakoehl@ucdavis.edu</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
Biotechnology program, University of California, Davis, 1, Shields Avenue, Davis, CA, 95616 USA</aff>
<aff id="Aff2">
<label>2</label>
Department of Computer Science and Genome Center, 1, Shields Avenue, Davis, CA, 95616 USA</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>28</day>
<month>2</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>28</day>
<month>2</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>18</volume>
<elocation-id>137</elocation-id>
<history>
<date date-type="received">
<day>21</day>
<month>10</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>2</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2017</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>The amino acid sequence of a protein is the blueprint from which its structure and ultimately function can be derived. Therefore, sequence comparison methods remain essential for the determination of similarity between proteins. Traditional approaches for comparing two protein sequences begin with strings of letters (amino acids) that represent the sequences, before generating textual alignments between these strings and providing scores for each alignment. When the similitude between the two protein sequences to be compared is low however, the quality of the corresponding sequence alignment is usually poor, leading to poor performance for the recognition of similarity.</p>
</sec>
<sec>
<title>Results</title>
<p>In this study, we develop an alignment free alternative to these methods that is based on the concept of string kernels. Starting from recently proposed kernels on the discrete space of protein sequences (Shen et al,
<italic>Found. Comput. Math.</italic>
, 2013,14:951-984), we introduce our own version, SeqKernel. Its implementation depends on two parameters, a coefficient that tunes the substitution matrix and the maximum length of
<italic>k-mers</italic>
that it includes. We provide an exhaustive analysis of the impacts of these two parameters on the performance of SeqKernel for fold recognition. We show that with the right choice of parameters, use of the SeqKernel similarity measure improves fold recognition compared to the use of traditional alignment-based methods. We illustrate the application of SeqKernel to inferring phylogeny on RNA polymerases and show that it performs as well as methods based on multiple sequence alignments.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We have presented and characterized a new alignment free method based on a mathematical kernel for scoring the similarity of protein sequences. We discuss possible improvements of this method, as well as an extension of its applications to other modeling methods that rely on sequence comparison.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Protein sequence</kwd>
<kwd>Kernel</kwd>
<kwd>Alignment free methods</kwd>
</kwd-group>
<funding-group>
<award-group>
<funding-source>
<institution>Ministry of Education - Singapore</institution>
</funding-source>
<award-id>MOE2012-T3-1-008</award-id>
</award-group>
</funding-group>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2017</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000264 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000264 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:5331664
   |texte=   String kernels for protein sequence comparisons: improved fold recognition
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:28245816" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021