Serveur d'exploration sur l'oranger

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins

Identifieur interne : 000A86 ( Pmc/Curation ); précédent : 000A85; suivant : 000A87

A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins

Auteurs : Ruchi Verma [États-Unis] ; Ulrich Melcher [États-Unis]

Source :

RBID : PMC:3439722

Abstract

Background

Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM).

Result

The amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity.

Conclusion

The results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences.


Url:
DOI: 10.1186/1471-2105-13-S15-S9
PubMed: 23046503
PubMed Central: 3439722

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:3439722

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins</title>
<author>
<name sortKey="Verma, Ruchi" sort="Verma, Ruchi" uniqKey="Verma R" first="Ruchi" last="Verma">Ruchi Verma</name>
<affiliation wicri:level="2">
<nlm:aff id="I1">Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK 74078 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Oklahoma</region>
</placeName>
<wicri:cityArea>Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Melcher, Ulrich" sort="Melcher, Ulrich" uniqKey="Melcher U" first="Ulrich" last="Melcher">Ulrich Melcher</name>
<affiliation wicri:level="2">
<nlm:aff id="I1">Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK 74078 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Oklahoma</region>
</placeName>
<wicri:cityArea>Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">23046503</idno>
<idno type="pmc">3439722</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439722</idno>
<idno type="RBID">PMC:3439722</idno>
<idno type="doi">10.1186/1471-2105-13-S15-S9</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000A87</idno>
<idno type="wicri:Area/Pmc/Curation">000A86</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins</title>
<author>
<name sortKey="Verma, Ruchi" sort="Verma, Ruchi" uniqKey="Verma R" first="Ruchi" last="Verma">Ruchi Verma</name>
<affiliation wicri:level="2">
<nlm:aff id="I1">Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK 74078 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Oklahoma</region>
</placeName>
<wicri:cityArea>Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Melcher, Ulrich" sort="Melcher, Ulrich" uniqKey="Melcher U" first="Ulrich" last="Melcher">Ulrich Melcher</name>
<affiliation wicri:level="2">
<nlm:aff id="I1">Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK 74078 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Oklahoma</region>
</placeName>
<wicri:cityArea>Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM).</p>
</sec>
<sec>
<title>Result</title>
<p>The amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>The results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Strange, Rn" uniqKey="Strange R">RN Strange</name>
</author>
<author>
<name sortKey="Scott, Pr" uniqKey="Scott P">PR Scott</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Emerson, D" uniqKey="Emerson D">D Emerson</name>
</author>
<author>
<name sortKey="Rentz, Ja" uniqKey="Rentz J">JA Rentz</name>
</author>
<author>
<name sortKey="Lilburn, Tg" uniqKey="Lilburn T">TG Lilburn</name>
</author>
<author>
<name sortKey="Davis, Re" uniqKey="Davis R">RE Davis</name>
</author>
<author>
<name sortKey="Aldrich, H" uniqKey="Aldrich H">H Aldrich</name>
</author>
<author>
<name sortKey="Chan, C" uniqKey="Chan C">C Chan</name>
</author>
<author>
<name sortKey="Moyer, Cl" uniqKey="Moyer C">CL Moyer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Melcher, U" uniqKey="Melcher U">U Melcher</name>
</author>
<author>
<name sortKey="Grover, V" uniqKey="Grover V">V Grover</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fletcher, J" uniqKey="Fletcher J">J Fletcher</name>
</author>
<author>
<name sortKey="Bender, C" uniqKey="Bender C">C Bender</name>
</author>
<author>
<name sortKey="Budowle, B" uniqKey="Budowle B">B Budowle</name>
</author>
<author>
<name sortKey="Cobb, Wt" uniqKey="Cobb W">WT Cobb</name>
</author>
<author>
<name sortKey="Gold, Se" uniqKey="Gold S">SE Gold</name>
</author>
<author>
<name sortKey="Ishimaru, Ca" uniqKey="Ishimaru C">CA Ishimaru</name>
</author>
<author>
<name sortKey="Luster, D" uniqKey="Luster D">D Luster</name>
</author>
<author>
<name sortKey="Melcher, U" uniqKey="Melcher U">U Melcher</name>
</author>
<author>
<name sortKey="Murch, R" uniqKey="Murch R">R Murch</name>
</author>
<author>
<name sortKey="Scherm, H" uniqKey="Scherm H">H Scherm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Verma, R" uniqKey="Verma R">R Verma</name>
</author>
<author>
<name sortKey="Tiwari, A" uniqKey="Tiwari A">A Tiwari</name>
</author>
<author>
<name sortKey="Kaur, S" uniqKey="Kaur S">S Kaur</name>
</author>
<author>
<name sortKey="Varshney, Gc" uniqKey="Varshney G">GC Varshney</name>
</author>
<author>
<name sortKey="Raghava, Gp" uniqKey="Raghava G">GP Raghava</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kaundal, R" uniqKey="Kaundal R">R Kaundal</name>
</author>
<author>
<name sortKey="Raghava, Gp" uniqKey="Raghava G">GP Raghava</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hu, X" uniqKey="Hu X">X Hu</name>
</author>
<author>
<name sortKey="Wong, Kk" uniqKey="Wong K">KK Wong</name>
</author>
<author>
<name sortKey="Young, Gs" uniqKey="Young G">GS Young</name>
</author>
<author>
<name sortKey="Guo, L" uniqKey="Guo L">L Guo</name>
</author>
<author>
<name sortKey="Wong, St" uniqKey="Wong S">ST Wong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Choi, S" uniqKey="Choi S">S Choi</name>
</author>
<author>
<name sortKey="Jiang, Z" uniqKey="Jiang Z">Z Jiang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Magnin, B" uniqKey="Magnin B">B Magnin</name>
</author>
<author>
<name sortKey="Mesrob, L" uniqKey="Mesrob L">L Mesrob</name>
</author>
<author>
<name sortKey="Kinkingnehun, S" uniqKey="Kinkingnehun S">S Kinkingnehun</name>
</author>
<author>
<name sortKey="Pelegrini Issac, M" uniqKey="Pelegrini Issac M">M Pelegrini-Issac</name>
</author>
<author>
<name sortKey="Colliot, O" uniqKey="Colliot O">O Colliot</name>
</author>
<author>
<name sortKey="Sarazin, M" uniqKey="Sarazin M">M Sarazin</name>
</author>
<author>
<name sortKey="Dubois, B" uniqKey="Dubois B">B Dubois</name>
</author>
<author>
<name sortKey="Lehericy, S" uniqKey="Lehericy S">S Lehericy</name>
</author>
<author>
<name sortKey="Benali, H" uniqKey="Benali H">H Benali</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vert, Jp" uniqKey="Vert J">JP Vert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Furey, Ts" uniqKey="Furey T">TS Furey</name>
</author>
<author>
<name sortKey="Cristianini, N" uniqKey="Cristianini N">N Cristianini</name>
</author>
<author>
<name sortKey="Duffy, N" uniqKey="Duffy N">N Duffy</name>
</author>
<author>
<name sortKey="Bednarski, Dw" uniqKey="Bednarski D">DW Bednarski</name>
</author>
<author>
<name sortKey="Schummer, M" uniqKey="Schummer M">M Schummer</name>
</author>
<author>
<name sortKey="Haussler, D" uniqKey="Haussler D">D Haussler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dharmasaroja, P" uniqKey="Dharmasaroja P">P Dharmasaroja</name>
</author>
<author>
<name sortKey="Dharmasaroja, Pa" uniqKey="Dharmasaroja P">PA Dharmasaroja</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Naguib, Ia" uniqKey="Naguib I">IA Naguib</name>
</author>
<author>
<name sortKey="Darwish, Hw" uniqKey="Darwish H">HW Darwish</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dondoshansky, Iwy" uniqKey="Dondoshansky I">IWY Dondoshansky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Joachims, T" uniqKey="Joachims T">T Joachims</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="O Dwyer, L" uniqKey="O Dwyer L">L O'Dwyer</name>
</author>
<author>
<name sortKey="Lamberton, F" uniqKey="Lamberton F">F Lamberton</name>
</author>
<author>
<name sortKey="Bokde, Al" uniqKey="Bokde A">AL Bokde</name>
</author>
<author>
<name sortKey="Ewers, M" uniqKey="Ewers M">M Ewers</name>
</author>
<author>
<name sortKey="Faluyi, Yo" uniqKey="Faluyi Y">YO Faluyi</name>
</author>
<author>
<name sortKey="Tanner, C" uniqKey="Tanner C">C Tanner</name>
</author>
<author>
<name sortKey="Mazoyer, B" uniqKey="Mazoyer B">B Mazoyer</name>
</author>
<author>
<name sortKey="O Neill, D" uniqKey="O Neill D">D O'Neill</name>
</author>
<author>
<name sortKey="Bartley, M" uniqKey="Bartley M">M Bartley</name>
</author>
<author>
<name sortKey="Collins, Dr" uniqKey="Collins D">DR Collins</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ansari, Hr" uniqKey="Ansari H">HR Ansari</name>
</author>
<author>
<name sortKey="Raghava, Gp" uniqKey="Raghava G">GP Raghava</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baldi, P" uniqKey="Baldi P">P Baldi</name>
</author>
<author>
<name sortKey="Brunak, S" uniqKey="Brunak S">S Brunak</name>
</author>
<author>
<name sortKey="Chauvin, Y" uniqKey="Chauvin Y">Y Chauvin</name>
</author>
<author>
<name sortKey="Andersen, Ca" uniqKey="Andersen C">CA Andersen</name>
</author>
<author>
<name sortKey="Nielsen, H" uniqKey="Nielsen H">H Nielsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Verma, R" uniqKey="Verma R">R Verma</name>
</author>
<author>
<name sortKey="Varshney, Gc" uniqKey="Varshney G">GC Varshney</name>
</author>
<author>
<name sortKey="Raghava, Gp" uniqKey="Raghava G">GP Raghava</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lu, Q" uniqKey="Lu Q">Q Lu</name>
</author>
<author>
<name sortKey="Cui, Y" uniqKey="Cui Y">Y Cui</name>
</author>
<author>
<name sortKey="Ye, C" uniqKey="Ye C">C Ye</name>
</author>
<author>
<name sortKey="Wei, C" uniqKey="Wei C">C Wei</name>
</author>
<author>
<name sortKey="Elston, Rc" uniqKey="Elston R">RC Elston</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="He, X" uniqKey="He X">X He</name>
</author>
<author>
<name sortKey="Frey, E" uniqKey="Frey E">E Frey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chappell, Fm" uniqKey="Chappell F">FM Chappell</name>
</author>
<author>
<name sortKey="Raab, Gm" uniqKey="Raab G">GM Raab</name>
</author>
<author>
<name sortKey="Wardlaw, Jm" uniqKey="Wardlaw J">JM Wardlaw</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Algarabel, S" uniqKey="Algarabel S">S Algarabel</name>
</author>
<author>
<name sortKey="Pitarque, A" uniqKey="Pitarque A">A Pitarque</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Higashida, Y" uniqKey="Higashida Y">Y Higashida</name>
</author>
<author>
<name sortKey="Ideguchi, T" uniqKey="Ideguchi T">T Ideguchi</name>
</author>
<author>
<name sortKey="Muranaka, T" uniqKey="Muranaka T">T Muranaka</name>
</author>
<author>
<name sortKey="Tabata, N" uniqKey="Tabata N">N Tabata</name>
</author>
<author>
<name sortKey="Miyajima, R" uniqKey="Miyajima R">R Miyajima</name>
</author>
<author>
<name sortKey="Akazawa, F" uniqKey="Akazawa F">F Akazawa</name>
</author>
<author>
<name sortKey="Ikeda, H" uniqKey="Ikeda H">H Ikeda</name>
</author>
<author>
<name sortKey="Morimoto, K" uniqKey="Morimoto K">K Morimoto</name>
</author>
<author>
<name sortKey="Ohki, M" uniqKey="Ohki M">M Ohki</name>
</author>
<author>
<name sortKey="Toyofuku, F" uniqKey="Toyofuku F">F Toyofuku</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wiebringhaus, R" uniqKey="Wiebringhaus R">R Wiebringhaus</name>
</author>
<author>
<name sortKey="John, V" uniqKey="John V">V John</name>
</author>
<author>
<name sortKey="Muller, Rd" uniqKey="Muller R">RD Muller</name>
</author>
<author>
<name sortKey="Hirche, H" uniqKey="Hirche H">H Hirche</name>
</author>
<author>
<name sortKey="Voss, M" uniqKey="Voss M">M Voss</name>
</author>
<author>
<name sortKey="Callies, R" uniqKey="Callies R">R Callies</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Daures, Jp" uniqKey="Daures J">JP Daures</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hannequin, P" uniqKey="Hannequin P">P Hannequin</name>
</author>
<author>
<name sortKey="Liehn, Jc" uniqKey="Liehn J">JC Liehn</name>
</author>
<author>
<name sortKey="Delisle, Mj" uniqKey="Delisle M">MJ Delisle</name>
</author>
<author>
<name sortKey="Deltour, G" uniqKey="Deltour G">G Deltour</name>
</author>
<author>
<name sortKey="Valeyre, J" uniqKey="Valeyre J">J Valeyre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Creelman, Cd" uniqKey="Creelman C">CD Creelman</name>
</author>
<author>
<name sortKey="Donaldson, W" uniqKey="Donaldson W">W Donaldson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Balakrishnan, N" uniqKey="Balakrishnan N">N Balakrishnan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zahr, N" uniqKey="Zahr N">N Zahr</name>
</author>
<author>
<name sortKey="Arnaud, L" uniqKey="Arnaud L">L Arnaud</name>
</author>
<author>
<name sortKey="Marquet, P" uniqKey="Marquet P">P Marquet</name>
</author>
<author>
<name sortKey="Haroche, J" uniqKey="Haroche J">J Haroche</name>
</author>
<author>
<name sortKey="Costedoat Chalumeau, N" uniqKey="Costedoat Chalumeau N">N Costedoat-Chalumeau</name>
</author>
<author>
<name sortKey="Hulot, Js" uniqKey="Hulot J">JS Hulot</name>
</author>
<author>
<name sortKey="Funck Brentano, C" uniqKey="Funck Brentano C">C Funck-Brentano</name>
</author>
<author>
<name sortKey="Piette, Jc" uniqKey="Piette J">JC Piette</name>
</author>
<author>
<name sortKey="Amoura, Z" uniqKey="Amoura Z">Z Amoura</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">23046503</article-id>
<article-id pub-id-type="pmc">3439722</article-id>
<article-id pub-id-type="publisher-id">1471-2105-13-S15-S9</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-13-S15-S9</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Proceedings</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" id="A1">
<name>
<surname>Verma</surname>
<given-names>Ruchi</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>ruchi.verma@okstate.edu</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A2">
<name>
<surname>Melcher</surname>
<given-names>Ulrich</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>ulrich.melcher@okstate.edu</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK 74078 USA</aff>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>11</day>
<month>9</month>
<year>2012</year>
</pub-date>
<volume>13</volume>
<issue>Suppl 15</issue>
<supplement>
<named-content content-type="supplement-title">Ninth Annual MCBIOS Conference. Dealing with the Omics Data Deluge</named-content>
<named-content content-type="supplement-editor">Jonathan Wren, Susan Bridges, Doris Kupfer, Dennis Burian, Mikhail Dozmorov and Rakesh Kaundal</named-content>
</supplement>
<fpage>S9</fpage>
<lpage>S9</lpage>
<permissions>
<copyright-statement>Copyright ©2012 Verma and Melcher; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2012</copyright-year>
<copyright-holder>Verma and Melcher; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2105/13/S15/S9"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM).</p>
</sec>
<sec>
<title>Result</title>
<p>The amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>The results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences.</p>
</sec>
</abstract>
<kwd-group>
<kwd>proteobacteria</kwd>
<kwd>plant proteins</kwd>
<kwd>SVM</kwd>
<kwd>machine learning</kwd>
<kwd>amino acid composition</kwd>
<kwd>dipeptide composition</kwd>
</kwd-group>
<conference>
<conf-date>17-18 February 2012</conf-date>
<conf-name>Proceedings of the Ninth Annual MCBIOS Conference. Dealing with the Omics Data Deluge</conf-name>
<conf-loc>Oxford, MS, USA</conf-loc>
</conference>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Bois/explor/OrangerV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A86 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000A86 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Bois
   |area=    OrangerV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:3439722
   |texte=   A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:23046503" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a OrangerV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Sat Dec 3 17:11:04 2016. Site generation: Wed Mar 6 18:18:32 2024