MersV1, Pmc, Corpus, bibRecord, 000284

PTPD: predicting therapeutic peptides by deep learning and word2vec

Identifieur interne : 000284 ( Pmc/Corpus ); précédent : 000283; suivant : 000285

PTPD: predicting therapeutic peptides by deep learning and word2vec

Auteurs : Chuanyan Wu ; Rui Gao ; Yusen Zhang ; Yang De Marinis

Source :

BMC Bioinformatics [ 1471-2105 ] ; 2019.

RBID : PMC:6728961

Abstract

Background In the search for therapeutic peptides for disease treatments, many efforts have been made to identify various functional peptides from large numbers of peptide sequence databases. In this paper, we propose an effective computational model that uses deep learning and word2vec to predict therapeutic peptides (PTPD).

Results Representation vectors of all k-mers were obtained through word2vec based on k-mer co-existence information. The original peptide sequences were then divided into k-mers using the windowing method. The peptide sequences were mapped to the input layer by the embedding vector obtained by word2vec. Three types of filters in the convolutional layers, as well as dropout and max-pooling operations, were applied to construct feature maps. These feature maps were concatenated into a fully connected dense layer, and rectified linear units (ReLU) and dropout operations were included to avoid over-fitting of PTPD. The classification probabilities were generated by a sigmoid function. PTPD was then validated using two datasets: an independent anticancer peptide dataset and a virulent protein dataset, on which it achieved accuracies of 96% and 94%, respectively.

Conclusions PTPD identified novel therapeutic peptides efficiently, and it is suitable for application as a useful tool in therapeutic peptide design.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6728961

DOI: 10.1186/s12859-019-3006-z
PubMed: 31492094
PubMed Central: 6728961

Links to Exploration step

PMC:6728961

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">PTPD: predicting therapeutic peptides by deep learning and word2vec</title>
<author><name sortKey="Wu, Chuanyan" sort="Wu, Chuanyan" uniqKey="Wu C" first="Chuanyan" last="Wu">Chuanyan Wu</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1761 1174</institution-id>
<institution-id institution-id-type="GRID">grid.27255.37</institution-id>
<institution>School of Control Science and Engineering, Shandong University,</institution>
</institution-wrap>
Jingshi Road, Jinan, 250061 China</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 0930 2361</institution-id>
<institution-id institution-id-type="GRID">grid.4514.4</institution-id>
<institution>Diabetes and Endocrinology, Lund University,</institution>
</institution-wrap>
Malmo, 20502 Sweden</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Gao, Rui" sort="Gao, Rui" uniqKey="Gao R" first="Rui" last="Gao">Rui Gao</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1761 1174</institution-id>
<institution-id institution-id-type="GRID">grid.27255.37</institution-id>
<institution>School of Control Science and Engineering, Shandong University,</institution>
</institution-wrap>
Jingshi Road, Jinan, 250061 China</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Zhang, Yusen" sort="Zhang, Yusen" uniqKey="Zhang Y" first="Yusen" last="Zhang">Yusen Zhang</name>
<affiliation><nlm:aff id="Aff3"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1761 1174</institution-id>
<institution-id institution-id-type="GRID">grid.27255.37</institution-id>
<institution>School of Mathematics and Statistics, Shandong University at Weihai,</institution>
</institution-wrap>
Weihai, 264209 China</nlm:aff>
</affiliation>
</author>
<author><name sortKey="De Marinis, Yang" sort="De Marinis, Yang" uniqKey="De Marinis Y" first="Yang" last="De Marinis">Yang De Marinis</name>
<affiliation><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 0930 2361</institution-id>
<institution-id institution-id-type="GRID">grid.4514.4</institution-id>
<institution>Diabetes and Endocrinology, Lund University,</institution>
</institution-wrap>
Malmo, 20502 Sweden</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">31492094</idno>
<idno type="pmc">6728961</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6728961</idno>
<idno type="RBID">PMC:6728961</idno>
<idno type="doi">10.1186/s12859-019-3006-z</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">000284</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000284</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">PTPD: predicting therapeutic peptides by deep learning and word2vec</title>
<author><name sortKey="Wu, Chuanyan" sort="Wu, Chuanyan" uniqKey="Wu C" first="Chuanyan" last="Wu">Chuanyan Wu</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1761 1174</institution-id>
<institution-id institution-id-type="GRID">grid.27255.37</institution-id>
<institution>School of Control Science and Engineering, Shandong University,</institution>
</institution-wrap>
Jingshi Road, Jinan, 250061 China</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 0930 2361</institution-id>
<institution-id institution-id-type="GRID">grid.4514.4</institution-id>
<institution>Diabetes and Endocrinology, Lund University,</institution>
</institution-wrap>
Malmo, 20502 Sweden</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Gao, Rui" sort="Gao, Rui" uniqKey="Gao R" first="Rui" last="Gao">Rui Gao</name>
<affiliation><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1761 1174</institution-id>
<institution-id institution-id-type="GRID">grid.27255.37</institution-id>
<institution>School of Control Science and Engineering, Shandong University,</institution>
</institution-wrap>
Jingshi Road, Jinan, 250061 China</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Zhang, Yusen" sort="Zhang, Yusen" uniqKey="Zhang Y" first="Yusen" last="Zhang">Yusen Zhang</name>
<affiliation><nlm:aff id="Aff3"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1761 1174</institution-id>
<institution-id institution-id-type="GRID">grid.27255.37</institution-id>
<institution>School of Mathematics and Statistics, Shandong University at Weihai,</institution>
</institution-wrap>
Weihai, 264209 China</nlm:aff>
</affiliation>
</author>
<author><name sortKey="De Marinis, Yang" sort="De Marinis, Yang" uniqKey="De Marinis Y" first="Yang" last="De Marinis">Yang De Marinis</name>
<affiliation><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 0930 2361</institution-id>
<institution-id institution-id-type="GRID">grid.4514.4</institution-id>
<institution>Diabetes and Endocrinology, Lund University,</institution>
</institution-wrap>
Malmo, 20502 Sweden</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>*</title>
<p>Background In the search for therapeutic peptides for disease treatments, many efforts have been made to identify various functional peptides from large numbers of peptide sequence databases. In this paper, we propose an effective computational model that uses deep learning and word2vec to predict therapeutic peptides (PTPD).</p>
</sec>
<sec><title>*</title>
<p>Results Representation vectors of all <italic>k</italic>
-mers were obtained through word2vec based on <italic>k</italic>
-mer co-existence information. The original peptide sequences were then divided into <italic>k</italic>
-mers using the windowing method. The peptide sequences were mapped to the input layer by the embedding vector obtained by word2vec. Three types of filters in the convolutional layers, as well as dropout and max-pooling operations, were applied to construct feature maps. These feature maps were concatenated into a fully connected dense layer, and rectified linear units (ReLU) and dropout operations were included to avoid over-fitting of PTPD. The classification probabilities were generated by a sigmoid function. PTPD was then validated using two datasets: an independent anticancer peptide dataset and a virulent protein dataset, on which it achieved accuracies of 96% and 94%, respectively.</p>
</sec>
<sec><title>*</title>
<p>Conclusions PTPD identified novel therapeutic peptides efficiently, and it is suitable for application as a useful tool in therapeutic peptide design.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Torre, La" uniqKey="Torre L">LA Torre</name>
</author>
<author><name sortKey="Bray, F" uniqKey="Bray F">F Bray</name>
</author>
<author><name sortKey="Siegel, Rl" uniqKey="Siegel R">RL Siegel</name>
</author>
<author><name sortKey="Ferlay, J" uniqKey="Ferlay J">J Ferlay</name>
</author>
<author><name sortKey="Lortet Tieulent, J" uniqKey="Lortet Tieulent J">J Lortet-Tieulent</name>
</author>
<author><name sortKey="Jemal, A" uniqKey="Jemal A">A Jemal</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Al Benna, S" uniqKey="Al Benna S">S Al-Benna</name>
</author>
<author><name sortKey="Shai, Y" uniqKey="Shai Y">Y Shai</name>
</author>
<author><name sortKey="Jacobsen, F" uniqKey="Jacobsen F">F Jacobsen</name>
</author>
<author><name sortKey="Steinstraesser, L" uniqKey="Steinstraesser L">L Steinstraesser</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kalyanaraman, B" uniqKey="Kalyanaraman B">B Kalyanaraman</name>
</author>
<author><name sortKey="Joseph, J" uniqKey="Joseph J">J Joseph</name>
</author>
<author><name sortKey="Kalivendi, S" uniqKey="Kalivendi S">S Kalivendi</name>
</author>
<author><name sortKey="Wang, S" uniqKey="Wang S">S Wang</name>
</author>
<author><name sortKey="Konorev, E" uniqKey="Konorev E">E Konorev</name>
</author>
<author><name sortKey="Kotamraju, S" uniqKey="Kotamraju S">S Kotamraju</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Huang, Y" uniqKey="Huang Y">Y Huang</name>
</author>
<author><name sortKey="Feng, Q" uniqKey="Feng Q">Q Feng</name>
</author>
<author><name sortKey="Yan, Q" uniqKey="Yan Q">Q Yan</name>
</author>
<author><name sortKey="Hao, X" uniqKey="Hao X">X Hao</name>
</author>
<author><name sortKey="Chen, Y" uniqKey="Chen Y">Y Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chen, W" uniqKey="Chen W">W Chen</name>
</author>
<author><name sortKey="Ding, H" uniqKey="Ding H">H Ding</name>
</author>
<author><name sortKey="Feng, P" uniqKey="Feng P">P Feng</name>
</author>
<author><name sortKey="Lin, H" uniqKey="Lin H">H Lin</name>
</author>
<author><name sortKey="Chou, Kc" uniqKey="Chou K">KC Chou</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, Fm" uniqKey="Li F">FM Li</name>
</author>
<author><name sortKey="Wang, Xq" uniqKey="Wang X">XQ Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Xu, L" uniqKey="Xu L">L Xu</name>
</author>
<author><name sortKey="Liang, G" uniqKey="Liang G">G Liang</name>
</author>
<author><name sortKey="Wang, L" uniqKey="Wang L">L Wang</name>
</author>
<author><name sortKey="Liao, C" uniqKey="Liao C">C Liao</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hajisharifi, Z" uniqKey="Hajisharifi Z">Z Hajisharifi</name>
</author>
<author><name sortKey="Piryaiee, M" uniqKey="Piryaiee M">M Piryaiee</name>
</author>
<author><name sortKey="Mohammad Beigi, M" uniqKey="Mohammad Beigi M">M Mohammad Beigi</name>
</author>
<author><name sortKey="Behbahani, M" uniqKey="Behbahani M">M Behbahani</name>
</author>
<author><name sortKey="Mohabatkar, H" uniqKey="Mohabatkar H">H Mohabatkar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Akbar, S" uniqKey="Akbar S">S Akbar</name>
</author>
<author><name sortKey="Hayat, M" uniqKey="Hayat M">M Hayat</name>
</author>
<author><name sortKey="Iqbal, M" uniqKey="Iqbal M">M Iqbal</name>
</author>
<author><name sortKey="Jan, Ma" uniqKey="Jan M">MA Jan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Xu, C" uniqKey="Xu C">C Xu</name>
</author>
<author><name sortKey="Ge, L" uniqKey="Ge L">L Ge</name>
</author>
<author><name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author><name sortKey="Dehmer, M" uniqKey="Dehmer M">M Dehmer</name>
</author>
<author><name sortKey="Gutman, I" uniqKey="Gutman I">I Gutman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Manavalan, B" uniqKey="Manavalan B">B Manavalan</name>
</author>
<author><name sortKey="Basith, S" uniqKey="Basith S">S Basith</name>
</author>
<author><name sortKey="Shin, Th" uniqKey="Shin T">TH Shin</name>
</author>
<author><name sortKey="Choi, S" uniqKey="Choi S">S Choi</name>
</author>
<author><name sortKey="Kim, Mo" uniqKey="Kim M">MO Kim</name>
</author>
<author><name sortKey="Lee, G" uniqKey="Lee G">G Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Manavalan, B" uniqKey="Manavalan B">B Manavalan</name>
</author>
<author><name sortKey="Basith, S" uniqKey="Basith S">S Basith</name>
</author>
<author><name sortKey="Shin, Th" uniqKey="Shin T">TH Shin</name>
</author>
<author><name sortKey="Choi, S" uniqKey="Choi S">S Choi</name>
</author>
<author><name sortKey="Kim, Mo" uniqKey="Kim M">MO Kim</name>
</author>
<author><name sortKey="Lee, G" uniqKey="Lee G">G Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wei, L" uniqKey="Wei L">L Wei</name>
</author>
<author><name sortKey="Zhou, C" uniqKey="Zhou C">C Zhou</name>
</author>
<author><name sortKey="Chen, H" uniqKey="Chen H">H Chen</name>
</author>
<author><name sortKey="Song, J" uniqKey="Song J">J Song</name>
</author>
<author><name sortKey="Su, R" uniqKey="Su R">R Su</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author><name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author><name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author><name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author><name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author><name sortKey="Madden, Tl" uniqKey="Madden T">TL Madden</name>
</author>
<author><name sortKey="Sch Ffer, Aa" uniqKey="Sch Ffer A">AA Schäffer</name>
</author>
<author><name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author><name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author><name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Saha, S" uniqKey="Saha S">S Saha</name>
</author>
<author><name sortKey="Raghava, Gps" uniqKey="Raghava G">GPS Raghava</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nanni, L" uniqKey="Nanni L">L Nanni</name>
</author>
<author><name sortKey="Lumini, A" uniqKey="Lumini A">A Lumini</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Garg, A" uniqKey="Garg A">A Garg</name>
</author>
<author><name sortKey="Gupta, D" uniqKey="Gupta D">D Gupta</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nanni, L" uniqKey="Nanni L">L Nanni</name>
</author>
<author><name sortKey="Lumini, A" uniqKey="Lumini A">A Lumini</name>
</author>
<author><name sortKey="Gupta, D" uniqKey="Gupta D">D Gupta</name>
</author>
<author><name sortKey="Garg, A" uniqKey="Garg A">A Garg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Krizhevsky, A" uniqKey="Krizhevsky A">A Krizhevsky</name>
</author>
<author><name sortKey="Sutskever, I" uniqKey="Sutskever I">I Sutskever</name>
</author>
<author><name sortKey="Hinton, Ge" uniqKey="Hinton G">GE Hinton</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Szegedy, C" uniqKey="Szegedy C">C Szegedy</name>
</author>
<author><name sortKey="Liu, W" uniqKey="Liu W">W Liu</name>
</author>
<author><name sortKey="Jia, Y" uniqKey="Jia Y">Y Jia</name>
</author>
<author><name sortKey="Sermanet, P" uniqKey="Sermanet P">P Sermanet</name>
</author>
<author><name sortKey="Reed, S" uniqKey="Reed S">S Reed</name>
</author>
<author><name sortKey="Anguelov, D" uniqKey="Anguelov D">D Anguelov</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="He, K" uniqKey="He K">K He</name>
</author>
<author><name sortKey="Zhang, X" uniqKey="Zhang X">X Zhang</name>
</author>
<author><name sortKey="Ren, S" uniqKey="Ren S">S Ren</name>
</author>
<author><name sortKey="Sun, J" uniqKey="Sun J">J Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Girshick, R" uniqKey="Girshick R">R Girshick</name>
</author>
<author><name sortKey="Donahue, J" uniqKey="Donahue J">J Donahue</name>
</author>
<author><name sortKey="Darrell, T" uniqKey="Darrell T">T Darrell</name>
</author>
<author><name sortKey="Malik, J" uniqKey="Malik J">J Malik</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ren, S" uniqKey="Ren S">S Ren</name>
</author>
<author><name sortKey="He, K" uniqKey="He K">K He</name>
</author>
<author><name sortKey="Girshick, R" uniqKey="Girshick R">R Girshick</name>
</author>
<author><name sortKey="Sun, J" uniqKey="Sun J">J Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tang, P" uniqKey="Tang P">P Tang</name>
</author>
<author><name sortKey="Wang, H" uniqKey="Wang H">H Wang</name>
</author>
<author><name sortKey="Kwong, S" uniqKey="Kwong S">S Kwong</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhong, Z" uniqKey="Zhong Z">Z Zhong</name>
</author>
<author><name sortKey="Jin, L" uniqKey="Jin L">L Jin</name>
</author>
<author><name sortKey="Xie, Z" uniqKey="Xie Z">Z Xie</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author><name sortKey="Roller, S" uniqKey="Roller S">S Roller</name>
</author>
<author><name sortKey="Wallace, Bc" uniqKey="Wallace B">BC Wallace</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Min, X" uniqKey="Min X">X Min</name>
</author>
<author><name sortKey="Zeng, W" uniqKey="Zeng W">W Zeng</name>
</author>
<author><name sortKey="Chen, N" uniqKey="Chen N">N Chen</name>
</author>
<author><name sortKey="Chen, T" uniqKey="Chen T">T Chen</name>
</author>
<author><name sortKey="Jiang, R" uniqKey="Jiang R">R Jiang</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Novkovi, M" uniqKey="Novkovi M">M Novković</name>
</author>
<author><name sortKey="Simuni, J" uniqKey="Simuni J">J Simunić</name>
</author>
<author><name sortKey="Bojovi, V" uniqKey="Bojovi V">V Bojović</name>
</author>
<author><name sortKey="Tossi, A" uniqKey="Tossi A">A Tossi</name>
</author>
<author><name sortKey="Jureti, D" uniqKey="Jureti D">D Juretić</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hajisharifi, Z" uniqKey="Hajisharifi Z">Z Hajisharifi</name>
</author>
<author><name sortKey="Piryaiee, M" uniqKey="Piryaiee M">M Piryaiee</name>
</author>
<author><name sortKey="Beigi, Mm" uniqKey="Beigi M">MM Beigi</name>
</author>
<author><name sortKey="Behbahani, M" uniqKey="Behbahani M">M Behbahani</name>
</author>
<author><name sortKey="Mohabatkar, H" uniqKey="Mohabatkar H">H Mohabatkar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chen, W" uniqKey="Chen W">W Chen</name>
</author>
<author><name sortKey="Ding, H" uniqKey="Ding H">H Ding</name>
</author>
<author><name sortKey="Feng, P" uniqKey="Feng P">P Feng</name>
</author>
<author><name sortKey="Lin, H" uniqKey="Lin H">H Lin</name>
</author>
<author><name sortKey="Chou, Kc" uniqKey="Chou K">KC Chou</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Compeau, Pec" uniqKey="Compeau P">PEC Compeau</name>
</author>
<author><name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author><name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Aggarwala, V" uniqKey="Aggarwala V">V Aggarwala</name>
</author>
<author><name sortKey="Voight, Bf" uniqKey="Voight B">BF Voight</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hinton, Ge" uniqKey="Hinton G">GE Hinton</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hu, B" uniqKey="Hu B">B Hu</name>
</author>
<author><name sortKey="Tang, B" uniqKey="Tang B">B Tang</name>
</author>
<author><name sortKey="Chen, Q" uniqKey="Chen Q">Q Chen</name>
</author>
<author><name sortKey="Kang, L" uniqKey="Kang L">L Kang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mikolov, T" uniqKey="Mikolov T">T Mikolov</name>
</author>
<author><name sortKey="Sutskever, I" uniqKey="Sutskever I">I Sutskever</name>
</author>
<author><name sortKey="Chen, K" uniqKey="Chen K">K Chen</name>
</author>
<author><name sortKey="Corrado, G" uniqKey="Corrado G">G Corrado</name>
</author>
<author><name sortKey="Dean, J" uniqKey="Dean J">J Dean</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, D" uniqKey="Zhang D">D Zhang</name>
</author>
<author><name sortKey="Xu, H" uniqKey="Xu H">H Xu</name>
</author>
<author><name sortKey="Su, Z" uniqKey="Su Z">Z Su</name>
</author>
<author><name sortKey="Xu, Y" uniqKey="Xu Y">Y Xu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nair, V" uniqKey="Nair V">V Nair</name>
</author>
<author><name sortKey="Hinton, Ge" uniqKey="Hinton G">GE Hinton</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Boopathi, V" uniqKey="Boopathi V">V Boopathi</name>
</author>
<author><name sortKey="Subramaniyam, S" uniqKey="Subramaniyam S">S Subramaniyam</name>
</author>
<author><name sortKey="Malik, A" uniqKey="Malik A">A Malik</name>
</author>
<author><name sortKey="Lee, G" uniqKey="Lee G">G Lee</name>
</author>
<author><name sortKey="Manavalan, B" uniqKey="Manavalan B">B Manavalan</name>
</author>
<author><name sortKey="Yang, Dc" uniqKey="Yang D">DC Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nanni, L" uniqKey="Nanni L">L Nanni</name>
</author>
<author><name sortKey="Lumini, A" uniqKey="Lumini A">A Lumini</name>
</author>
<author><name sortKey="Brahnam, S" uniqKey="Brahnam S">S Brahnam</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Win, Ts" uniqKey="Win T">TS Win</name>
</author>
<author><name sortKey="Schaduangrat, N" uniqKey="Schaduangrat N">N Schaduangrat</name>
</author>
<author><name sortKey="Prachayasittikul, V" uniqKey="Prachayasittikul V">V Prachayasittikul</name>
</author>
<author><name sortKey="Nantasenamat, C" uniqKey="Nantasenamat C">C Nantasenamat</name>
</author>
<author><name sortKey="Shoombuatong, W" uniqKey="Shoombuatong W">W Shoombuatong</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Manavalan, B" uniqKey="Manavalan B">B Manavalan</name>
</author>
<author><name sortKey="Shin, Th" uniqKey="Shin T">TH Shin</name>
</author>
<author><name sortKey="Kim, Mo" uniqKey="Kim M">MO Kim</name>
</author>
<author><name sortKey="Lee, G" uniqKey="Lee G">G Lee</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group><journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">31492094</article-id>
<article-id pub-id-type="pmc">6728961</article-id>
<article-id pub-id-type="publisher-id">3006</article-id>
<article-id pub-id-type="doi">10.1186/s12859-019-3006-z</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Methodology Article</subject>
</subj-group>
</article-categories>
<title-group><article-title>PTPD: predicting therapeutic peptides by deep learning and word2vec</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Wu</surname>
<given-names>Chuanyan</given-names>
</name>
<address><email>chuanyan_wu@163.com</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author" corresp="yes"><name><surname>Gao</surname>
<given-names>Rui</given-names>
</name>
<address><email>gaorui@sdu.edu.cn</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Zhang</surname>
<given-names>Yusen</given-names>
</name>
<address><email>zhangys@sdu.edu.cn</email>
</address>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<contrib contrib-type="author"><name><surname>De Marinis</surname>
<given-names>Yang</given-names>
</name>
<address><email>yang.de_marinis@med.lu.se</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<aff id="Aff1"><label>1</label>
<institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1761 1174</institution-id>
<institution-id institution-id-type="GRID">grid.27255.37</institution-id>
<institution>School of Control Science and Engineering, Shandong University,</institution>
</institution-wrap>
Jingshi Road, Jinan, 250061 China</aff>
<aff id="Aff2"><label>2</label>
<institution-wrap><institution-id institution-id-type="ISNI">0000 0001 0930 2361</institution-id>
<institution-id institution-id-type="GRID">grid.4514.4</institution-id>
<institution>Diabetes and Endocrinology, Lund University,</institution>
</institution-wrap>
Malmo, 20502 Sweden</aff>
<aff id="Aff3"><label>3</label>
<institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1761 1174</institution-id>
<institution-id institution-id-type="GRID">grid.27255.37</institution-id>
<institution>School of Mathematics and Statistics, Shandong University at Weihai,</institution>
</institution-wrap>
Weihai, 264209 China</aff>
</contrib-group>
<pub-date pub-type="epub"><day>6</day>
<month>9</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="pmc-release"><day>6</day>
<month>9</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection"><year>2019</year>
</pub-date>
<volume>20</volume>
<elocation-id>456</elocation-id>
<history><date date-type="received"><day>28</day>
<month>4</month>
<year>2019</year>
</date>
<date date-type="accepted"><day>25</day>
<month>7</month>
<year>2019</year>
</date>
</history>
<permissions><copyright-statement>© The Author(s) 2019</copyright-statement>
<license license-type="OpenAccess"><license-p><bold>Open Access</bold>
 This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1"><sec><title>*</title>
<p>Background In the search for therapeutic peptides for disease treatments, many efforts have been made to identify various functional peptides from large numbers of peptide sequence databases. In this paper, we propose an effective computational model that uses deep learning and word2vec to predict therapeutic peptides (PTPD).</p>
</sec>
<sec><title>*</title>
<p>Results Representation vectors of all <italic>k</italic>
-mers were obtained through word2vec based on <italic>k</italic>
-mer co-existence information. The original peptide sequences were then divided into <italic>k</italic>
-mers using the windowing method. The peptide sequences were mapped to the input layer by the embedding vector obtained by word2vec. Three types of filters in the convolutional layers, as well as dropout and max-pooling operations, were applied to construct feature maps. These feature maps were concatenated into a fully connected dense layer, and rectified linear units (ReLU) and dropout operations were included to avoid over-fitting of PTPD. The classification probabilities were generated by a sigmoid function. PTPD was then validated using two datasets: an independent anticancer peptide dataset and a virulent protein dataset, on which it achieved accuracies of 96% and 94%, respectively.</p>
</sec>
<sec><title>*</title>
<p>Conclusions PTPD identified novel therapeutic peptides efficiently, and it is suitable for application as a useful tool in therapeutic peptide design.</p>
</sec>
</abstract>
<kwd-group xml:lang="en"><title>Keywords</title>
<kwd>Therapeutic peptide</kwd>
<kwd>Deep learning</kwd>
<kwd>Word2vec</kwd>
</kwd-group>
<funding-group><award-group><funding-source><institution>U1806202</institution>
</funding-source>
<award-id>U1806202</award-id>
</award-group>
</funding-group>
<funding-group><award-group><funding-source><institution>61533011</institution>
</funding-source>
<award-id>61533011</award-id>
</award-group>
</funding-group>
<funding-group><award-group><funding-source><institution>61877064</institution>
</funding-source>
<award-id>61877064</award-id>
</award-group>
</funding-group>
<custom-meta-group><custom-meta><meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2019</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body><sec id="Sec1"><title>Background</title>
<p>Cancer continues to a burden worldwide and its frequency is expected to double in the coming decades [<xref ref-type="bibr" rid="CR1">1</xref>
]. Available treatment regimens include radiation therapy, targeted therapy, and chemotherapy, all of which are often accompanied by harmful side effects and result in high financial costs for both individuals and society [<xref ref-type="bibr" rid="CR2">2</xref>
, <xref ref-type="bibr" rid="CR3">3</xref>
]. Anticancer peptides (ACPs) provide a new cost-efficient approach to cancer treatment, have minimal side effects, and have been shown to be promising in the treatment of various tumours by targeting mitochondria or membranolytic mechanisms [<xref ref-type="bibr" rid="CR4">4</xref>
]. Although progress has been made in preclinical applications of peptide-based methods against cancer cells, the mechanism behind the success of ACP treatments are still elusive. It is therefore highly important to be able to efficiently identify ACPs in both cancer research and drug development purposes. Due to the high costs and lengthy process of identifying ACP experimentally, various computational models have been developed to identify ACPs from peptide sequences. These advances include iACP development by g-gap dipeptide component (DPC) optimization [<xref ref-type="bibr" rid="CR5">5</xref>
, <xref ref-type="bibr" rid="CR6">6</xref>
], and SAP peptide identification by 400-dimensional features with g-gap dipeptide pruned by the maximum relevance-maximum distance method [<xref ref-type="bibr" rid="CR7">7</xref>
]. In addition, various types of amino acid compositions (AACs) of peptide sequences have been introduced to develop prediction models such as Chou’s pseudo amino acid composition (PseAAC) [<xref ref-type="bibr" rid="CR8">8</xref>
], combinations of AACs, average chemical shifts (acACS) and reduced AAC (RAAC) [<xref ref-type="bibr" rid="CR6">6</xref>
], pseudo g-Gap DPC, amphiphilic PseAAC, and reduced amino acid alphabet (RAAAC) [<xref ref-type="bibr" rid="CR9">9</xref>
]. Other methods include computational tools developed based on the q-Wiener graph indices for ACP predication [<xref ref-type="bibr" rid="CR10">10</xref>
]. In addition, machine learning methods were adopted to promote model efficiency [<xref ref-type="bibr" rid="CR6">6</xref>
, <xref ref-type="bibr" rid="CR9">9</xref>
, <xref ref-type="bibr" rid="CR11">11</xref>
]. Several models have utilized support vector machine (SVM) and random forest (RF) machine learning methods [<xref ref-type="bibr" rid="CR11">11</xref>
, <xref ref-type="bibr" rid="CR12">12</xref>
], combinations of the quantitative outcomes of individual classifiers (RF, K-nearest neighbor, SVM, generalized neural network and probabilistic neural network) [<xref ref-type="bibr" rid="CR9">9</xref>
], or a pool of SVM-based models trained by sequence-based features [<xref ref-type="bibr" rid="CR13">13</xref>
].</p>
<p>Novel computational models based on machine learning have also been applied to identify virulent proteins in infection pathophysiology. Virulent proteins consist of a diverse set of proteins and are important for host invasion and pathogenesis. Drug resistance to bacterial pathogens has created an urgent need to identify novel virulent proteins that may facilitate drug target and vaccine developments. Several computational models have been developed to identify virulent proteins. The first methods were developed based on similarity search methods such as the Basic Local Alignment Search Tool (BLAST) [<xref ref-type="bibr" rid="CR14">14</xref>
] and Position-specific Iterated BLAST (PSI-BLAST) [<xref ref-type="bibr" rid="CR15">15</xref>
]. Machine learning algorithms for predicting virulent proteins have also been reported that apply SVM-based models based on AAC and DPC [<xref ref-type="bibr" rid="CR16">16</xref>
], an ensemble of SVM-based models trained with features extracted directly from amino acid sequences [<xref ref-type="bibr" rid="CR17">17</xref>
], a bi-layer cascade SVM model [<xref ref-type="bibr" rid="CR18">18</xref>
], and a model based on an SVM and a variant of input decimated ensembles and their random subspace [<xref ref-type="bibr" rid="CR19">19</xref>
]. Studies have also focused on conducting feature extraction of sequences such as protein presentations, by using amino acid sequence features and evolutionary information of a given protein [<xref ref-type="bibr" rid="CR19">19</xref>
]. Moreover, a computational tool based on the q-Wiener graph indices was also proposed to effectively predict virulent proteins [<xref ref-type="bibr" rid="CR10">10</xref>
]. Despite substantial progress, identifying specific peptides from massive protein databases remains challenging.</p>
<p>To date, deep learning applications have been successful in numerous fields other than medicine, including image classification and recognition [<xref ref-type="bibr" rid="CR20">20</xref>
–<xref ref-type="bibr" rid="CR22">22</xref>
], object detection [<xref ref-type="bibr" rid="CR23">23</xref>
, <xref ref-type="bibr" rid="CR24">24</xref>
], scene recognition [<xref ref-type="bibr" rid="CR25">25</xref>
], character recognition [<xref ref-type="bibr" rid="CR26">26</xref>
], sentence classification [<xref ref-type="bibr" rid="CR27">27</xref>
], chromatin accessibility prediction [<xref ref-type="bibr" rid="CR28">28</xref>
] and so on. Inspired by these successful deep learning applications, we propose a novel computational model called PTPD, which is based on deep learning, to identify ACPs and virulent proteins from peptide sequences (Fig. <xref rid="Fig1" ref-type="fig">1</xref>
). To verify the efficiency of our approach, we also performed ACP and virulent protein prediction on publicly available datasets [<xref ref-type="bibr" rid="CR12">12</xref>
, <xref ref-type="bibr" rid="CR18">18</xref>
, <xref ref-type="bibr" rid="CR29">29</xref>]. Our results show that PTPD is able to identify ACPs and virulent proteins with high efficiency.
<fig id="Fig1"><label>Fig. 1</label>
<caption><p>Flowchart of PTPD</p>
</caption>
<graphic xlink:href="12859_2019_3006_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
</sec>
<sec id="Sec2"><title>Methods</title>
<sec id="Sec3"><title>Datasets</title>
<p>The ACP datasets were extracted from publicly available resources [<xref ref-type="bibr" rid="CR12">12</xref>
, <xref ref-type="bibr" rid="CR29">29</xref>
]. A total of 225 validated ACPs from the AMPs dataset and the database of Anuran defence peptides (DADP) [<xref ref-type="bibr" rid="CR30">30</xref>
] were used as positive samples, while 2,250 randomly selected proteins from the SwissProt protein database were used as negative samples. This dataset was used to build the model. An alternative dataset and two balanced datasets were employed to evaluate the model. To compare our methods with other existing methods, we also obtained an independent dataset (i.e. Hajisharifi-Chen (HC)) from a previous study [<xref ref-type="bibr" rid="CR12">12</xref>
]. The HC dataset, which contains 138 ACPs and 206 non-ACPs, was also employed to develop prediction models in [<xref ref-type="bibr" rid="CR31">31</xref>
, <xref ref-type="bibr" rid="CR32">32</xref>
].</p>
<p>The virulent protein datasets were obtained from VirulentPred [<xref ref-type="bibr" rid="CR18">18</xref>
] and NTX-pred method [<xref ref-type="bibr" rid="CR16">16</xref>
]. We adopted the SPAAN adhesins dataset, which contains 469 adhesion and 703 non-adhesion proteins, to build the PTPD model for virulent protein prediction. The neurotoxin dataset was applied as an independent dataset to evaluate the model. It contains 50 neurotoxins (positive samples) and 50 non-virulent proteins (negative samples) obtained by the NTX-pred method [<xref ref-type="bibr" rid="CR16">16</xref>
].</p>
</sec>
<sec id="Sec4"><title>Representation of <italic>k</italic>
-mers by word2vec</title>
<p>Each peptide sequence was divided into <italic>k</italic>
-mers by windowing method as previously described in [<xref ref-type="bibr" rid="CR33">33</xref>
, <xref ref-type="bibr" rid="CR34">34</xref>
]. To represent the <italic>k</italic>
-mers, we used the publicly available word2vec tool, which creates high-quality word embedding vectors according to a large number of <italic>k</italic>
-mers.</p>
<p>The word2vec tool computes vector representations of words and has been widely applied in many natural language processing tasks as well as other research applications [<xref ref-type="bibr" rid="CR35">35</xref>
–<xref ref-type="bibr" rid="CR38">38</xref>
]. Two learning algorithms are available in word2vec: continuous bag-of-words and continuous skip-gram. These algorithms learn word representations to help to predict other words in the sentence. The skip-gram model in word2vec trains the word vectors of each word based on the given corpus. Given a word (<italic>W</italic>
(<italic>t</italic>
)) in a sentence, skip-gram can predict the probabilities <italic>P</italic>
(<italic>W</italic>
(<italic>t</italic>
+<italic>i</italic>
)|<italic>W</italic>
(<italic>t</italic>
)) of nearby words <italic>W</italic>
<sub><italic>i</italic>
</sub>
(<italic>t</italic>
−<italic>k</italic>
≤<italic>i</italic>
≤<italic>t</italic>
+<italic>k</italic>
) based on the probability of the current word <italic>W</italic>
(<italic>t</italic>
). Each word vector reflects the positions of the nearby words, as illustrated in Fig. <xref rid="Fig2" ref-type="fig">2</xref>. The goal of the skip-gram model is to maximize the following value: 
<disp-formula id="Equ1"><label>1</label>
<alternatives><tex-math id="M1">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  E=\frac{1}{n}\sum\limits_{t=1}^{n}{\left(\sum\limits_{-k\le i\le k,i\ne 0}{{{log}_{2}}P(W(t+i)|W(t))} \right)},  $$ \end{document}</tex-math>
<mml:math id="M2"><mml:mi>E</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow><mml:mi>n</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover accent="false" accentunder="false"><mml:mrow><mml:mo>∑</mml:mo>
</mml:mrow>
<mml:mrow><mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow><mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mfenced close=")" open="(" separators=""><mml:mrow><mml:munder><mml:mrow><mml:mo>∑</mml:mo>
</mml:mrow>
<mml:mrow><mml:mo>−</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>≤</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>≤</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>≠</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:munder>
<mml:msub><mml:mrow><mml:mtext mathvariant="italic">log</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mi>P</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>W</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>W</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ1.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
<fig id="Fig2"><label>Fig. 2</label>
<caption><p>Skip-gram model structure</p>
</caption>
<graphic xlink:href="12859_2019_3006_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
<p>where <italic>k</italic>
 denotes the size of the window, and <italic>W</italic>
(<italic>t</italic>
+<italic>i</italic>
)(−<italic>k</italic>
≤<italic>i</italic>
≤<italic>k</italic>
) denotes <italic>k</italic>
 words near the current word <italic>W</italic>
(<italic>t</italic>
), and <italic>n</italic>
 denotes the number of words.</p>
<p>Because word2vec can reflect the positional relationships of words in a sequence and preserve structural information, we treated the <italic>k</italic>
-mers as the words. Using word2vec, the word embedding vector of each <italic>k</italic>
-mer with 100 dimensions was obtained.</p>
</sec>
<sec id="Sec5"><title>Input layer</title>
<p>After constructing the word representation of all the <italic>k</italic>
-mers, we mapped the peptide sequence to numeric vectors. First, we used stride <italic>st</italic>
 to divide a peptide sequence <italic>S</italic>
 with length <italic>L</italic>
<sub>0</sub>
 into <italic>k</italic>
-mers of length <italic>k</italic>
. The number of <italic>k</italic>
-mers and the subsequent number of vectors varied because the peptide sequences (<italic>S</italic>
) had different original lengths (<italic>L</italic>
<sub>0</sub>
). The vectors for one peptide were set to be the same length <italic>L</italic>
-the length of the longest vector for those peptide sequences. Vectors with lengths shorter than <italic>L</italic>
 zero-padded at the end as in the natural language process. Finally, the peptide sequence was converted to a vector <inline-formula id="IEq1"><alternatives><tex-math id="M3">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$\tilde {S}$\end{document}</tex-math>
<mml:math id="M4"><mml:mover accent="true"><mml:mrow><mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
</mml:math>
<inline-graphic xlink:href="12859_2019_3006_Article_IEq1.gif"></inline-graphic>
</alternatives>
</inline-formula>
 by the word vectors with dimensions <italic>L</italic>×100. 
<disp-formula id="Equ2"><label>2</label>
<alternatives><tex-math id="M5">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  {{\tilde{S}}_{L\times 100}}=padding({{f}_{map}}(k\_mer({{S}_{{{L}_{0}}}}))).  $$ \end{document}</tex-math>
<mml:math id="M6"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow><mml:mi>L</mml:mi>
<mml:mo>×</mml:mo>
<mml:mn>100</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mtext mathvariant="italic">padding</mml:mtext>
<mml:mo>(</mml:mo>
<mml:msub><mml:mrow><mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow><mml:mtext mathvariant="italic">map</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mtext>_mer</mml:mtext>
<mml:mo>(</mml:mo>
<mml:msub><mml:mrow><mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>)</mml:mo>
<mml:mo>)</mml:mo>
<mml:mi>.</mml:mi>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ2.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>To prevent over-fitting and to improve model generalization, dropout was applied to a fraction of the inputs (i.e., a portion of the inputs was randomly set to zero).</p>
</sec>
<sec id="Sec6"><title>Feature map</title>
<p>To extract features, a set of one-dimensional convolution filters was adopted to process the vectors of peptide sequences. The convolution kernel was a shape kernel with a size of (<italic>c</italic>
×100). We used three types of convolution filters with sizes of three, four, and five. All the kernels performed convolutions on the entire representation vector. For example, using one convolution kernel with a size of (<italic>c</italic>×100), the feature map was constructed as follows: 
<disp-formula id="Equ3"><label>3</label>
<alternatives><tex-math id="M7">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  {{F}_{c}}={{[f(m)]}_{(L-c+1)\times 1}},  $$ \end{document}</tex-math>
<mml:math id="M8"><mml:msub><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub><mml:mrow><mml:mo>[</mml:mo>
<mml:mi>f</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>L</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
<mml:mo>×</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ3.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p><disp-formula id="Equ4"><label>4</label>
<alternatives><tex-math id="M9">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  \begin{aligned} &f(m)=g(W\otimes \tilde{S}_{m}+b)\\ &=ReLU(\sum \limits_{i=0}^{c}{ \sum \limits_{j=0}^{100}{w(i,j)\times \tilde{s}(m+i,j)+b)}},\\ \end{aligned}  $$ \end{document}</tex-math>
<mml:math id="M10"><mml:mtable><mml:mtr><mml:mtd><mml:mi>f</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>g</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>W</mml:mi>
<mml:mo>⊗</mml:mo>
<mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow><mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>)</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mo>=</mml:mo>
<mml:mtext mathvariant="italic">ReLU</mml:mtext>
<mml:mo>(</mml:mo>
<mml:munderover accent="false" accentunder="false"><mml:mrow><mml:mo>∑</mml:mo>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:munderover accent="false" accentunder="false"><mml:mrow><mml:mo>∑</mml:mo>
</mml:mrow>
<mml:mrow><mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow><mml:mn>100</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:mi>w</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>×</mml:mo>
<mml:mover accent="true"><mml:mrow><mml:mi>s</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr></mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ4.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where <italic>f</italic>
(<italic>m</italic>
) denotes the <italic>m</italic>
th element of the feature map, <italic>ReLU</italic>
 denotes the rectified linear unit (ReLU) activation function, <italic>w</italic>
(<italic>i, j</italic>
) denotes the weight of the convolution kernel compiled by training, <italic>c</italic>
 denotes the size of filter, and <inline-formula id="IEq2"><alternatives><tex-math id="M11">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$\tilde {S}_{m}$\end{document}</tex-math>
<mml:math id="M12"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>S</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow><mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12859_2019_3006_Article_IEq2.gif"></inline-graphic>
</alternatives>
</inline-formula>
 denotes the <italic>m</italic>
th block of the representation vector of the peptide sequence. ReLU [<xref ref-type="bibr" rid="CR39">39</xref>] was used to set the negative results of the convolution calculation to zero, and is defined as follows: 
<disp-formula id="Equ5"><label>5</label>
<alternatives><tex-math id="M13">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  ReLU(a)=max(0,a)=\left\{ \begin{aligned} 0,&~\text{if }a \le \text{0},\\ a, &~\text{otherwise}. \end{aligned} \right.  $$ \end{document}</tex-math>
<mml:math id="M14"><mml:mtext mathvariant="italic">ReLU</mml:mtext>
<mml:mo>(</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mtext mathvariant="italic">max</mml:mtext>
<mml:mo>(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfenced close="" open="{" separators=""><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd><mml:mspace width="1em"></mml:mspace>
<mml:mtext>if</mml:mtext>
<mml:mi>a</mml:mi>
<mml:mo>≤</mml:mo>
<mml:mtext>0</mml:mtext>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd><mml:mspace width="1em"></mml:mspace>
<mml:mtext>otherwise</mml:mtext>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ5.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>Multiple filters were used for each filter type. Let <italic>nc</italic> be the number of convolution filters, we applied 
<disp-formula id="Equ6"><label>6</label>
<alternatives><tex-math id="M15">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  {{\tilde{F}}_{c}}={{[F_{c}^{1},F_{c}^{2}\ldots,F_{c}^{nc}]}_{(L-c+1)\times nc}}.  $$ \end{document}</tex-math>
<mml:math id="M16"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub><mml:mrow><mml:mo>[</mml:mo>
<mml:msubsup><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow><mml:mtext mathvariant="italic">nc</mml:mtext>
</mml:mrow>
</mml:msubsup>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mrow><mml:mo>(</mml:mo>
<mml:mi>L</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
<mml:mo>×</mml:mo>
<mml:mtext mathvariant="italic">nc</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mi>.</mml:mi>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ6.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>To reduce the spatial dimensions of the feature maps, max pooling was adopted following a convolution operation. A max pooling layer with a pooling window of size 2×1 and a stride of 2 was defined by the function 
<disp-formula id="Equ7"><label>7</label>
<alternatives><tex-math id="M17">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  \begin{aligned} &{{Z}_{c}}=({{z}_{i,j}})=pool({{{\tilde{F}}}_{c}})\\ &=[\max {{{\tilde{F}}}_{c}}(:,1),\dots,\max {{{\tilde{F}}}_{c}}(:,j),\dots,\max {{{\tilde{F}}}_{c}}(:,nc)], \end{aligned}  $$ \end{document}</tex-math>
<mml:math id="M18"><mml:mtable><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>Z</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>(</mml:mo>
<mml:msub><mml:mrow><mml:mi>z</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mtext mathvariant="italic">pool</mml:mtext>
<mml:mo>(</mml:mo>
<mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mo>=</mml:mo>
<mml:mo>[</mml:mo>
<mml:mo>max</mml:mo>
<mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mo>:</mml:mo>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>max</mml:mo>
<mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mo>:</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>max</mml:mo>
<mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mo>:</mml:mo>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">nc</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>]</mml:mo>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ7.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where 
<disp-formula id="Equ8"><label>8</label>
<alternatives><tex-math id="M19">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  \max {{\tilde{F}}_{c}}(i,j)=\underset{i'\in [i,i+2]}{\mathop{\max }}\,{{\tilde{F}}_{c}}({{i}^{'}},j).  $$ \end{document}</tex-math>
<mml:math id="M20"><mml:mo>max</mml:mo>
<mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:munder><mml:mrow><mml:mo>max</mml:mo>
</mml:mrow>
<mml:mrow><mml:msup><mml:mrow><mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo>′</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>∈</mml:mo>
<mml:mo>[</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:munder>
<mml:mspace width="0.3em"></mml:mspace>
<mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>F</mml:mi>
</mml:mrow>
<mml:mo>~</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msup><mml:mrow><mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow><mml:msup><mml:mrow></mml:mrow>
<mml:mrow><mml:mo>′</mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>)</mml:mo>
<mml:mi>.</mml:mi>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ8.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>The results were finally merged concatenated as follows: 
<disp-formula id="Equ9"><label>9</label>
<alternatives><tex-math id="M21">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  F{{A}_{m}}=[Z_{c1},Z_{c2},Z_{c3}],  $$ \end{document}</tex-math>
<mml:math id="M22"><mml:mi>F</mml:mi>
<mml:msub><mml:mrow><mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>[</mml:mo>
<mml:msub><mml:mrow><mml:mi>Z</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub><mml:mrow><mml:mi>Z</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub><mml:mrow><mml:mi>Z</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>c</mml:mi>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>]</mml:mo>
<mml:mo>,</mml:mo>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ9.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where <italic>c</italic>
1=3, <italic>c</italic>
2=4, and <italic>c</italic>
3=5 denote the three filter sizes we used. Then <italic>FA</italic>
<sub><italic>m</italic>
</sub>
 was processed by a fully connected hidden layer to produce <italic>FM</italic>
=<italic>ReLU</italic>
(<italic>FA</italic>
<sub><italic>m</italic>
</sub>
<italic>W</italic>
<sub><italic>ft</italic>
</sub>
), where <italic>ReLU</italic>
 represents a rectified linear activation unit, and <italic>W</italic>
<sub><italic>ft</italic>
</sub>
 is the weight matrix of the fully-connected layer.</p>
</sec>
<sec id="Sec7"><title>Classification</title>
<p>The last layer of PTPD adopted a fully-connected layer to obtain a single output. A sigmoid activation function was set to generate the output probability between zero and one, which was defined as 
<disp-formula id="Equ10"><label>10</label>
<alternatives><tex-math id="M23">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  \operatorname{Sigmoid}(x)=\frac{1}{1+{{e}^{-x}}}.  $$ \end{document}</tex-math>
<mml:math id="M24"><mml:mo>Sigmoid</mml:mo>
<mml:mo>(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:msup><mml:mrow><mml:mi>e</mml:mi>
</mml:mrow>
<mml:mrow><mml:mo>−</mml:mo>
<mml:mi>x</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:mfrac>
<mml:mi>.</mml:mi>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ10.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
</sec>
<sec id="Sec8"><title>Loss function and optimizer</title>
<p>A binary cross entropy loss function was used to train the model. The model was trained with the RMSprop optimizer. The binary cross entropy loss function between the predictions and targets was defined as 
<disp-formula id="Equ11"><label>11</label>
<alternatives><tex-math id="M25">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  L({{y}_{i}},{{\hat{y}}_{i}})={{y}_{i}}log({{\hat{y}}_{i}})+(1-{{y}_{i}})log(1-{{\hat{y}}_{i}}).  $$ \end{document}</tex-math>
<mml:math id="M26"><mml:mi>L</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub><mml:mrow><mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub><mml:mrow><mml:mi>ŷ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msub><mml:mrow><mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext mathvariant="italic">log</mml:mtext>
<mml:mo>(</mml:mo>
<mml:msub><mml:mrow><mml:mi>ŷ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msub><mml:mrow><mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mtext mathvariant="italic">log</mml:mtext>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msub><mml:mrow><mml:mi>ŷ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mi>.</mml:mi>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ11.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>The total cost of the two classes was 
<disp-formula id="Equ12"><label>12</label>
<alternatives><tex-math id="M27">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  L=\sum\limits_{i=1}^{2}{L({{y}_{i}},}{{\hat{y}}_{i}}).  $$ \end{document}</tex-math>
<mml:math id="M28"><mml:mi>L</mml:mi>
<mml:mo>=</mml:mo>
<mml:munderover accent="false" accentunder="false"><mml:mrow><mml:mo>∑</mml:mo>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:munderover>
<mml:mi>L</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub><mml:mrow><mml:mi>y</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub><mml:mrow><mml:mi>ŷ</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mi>.</mml:mi>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ12.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
</sec>
<sec id="Sec9"><title>Model evaluation</title>
<p>The performance of PTPD was evaluated by various metrics, including the sensitivity (Sn), specificity (Sp), prediction accuracy (Acc), Matthew’s correlation coefficient (MCC), and the area under the curve (AUC) of the receiver-operating characteristic (ROC) curve. These metrics were defined as follows: 
<disp-formula id="Equ13"><label>13</label>
<alternatives><tex-math id="M29">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document} $$  \left\{ \begin{array}{l} Sn=\frac{TP}{TP+FN} \\ Sp=\frac{TN}{TN+FP} \\ Acc=\frac{TP+TN}{TP+TN+FP+FN} \\ MCC=\frac{(TP\times TN)-(FP\times FN)}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}} \\ \end{array} \right.,  $$ \end{document}</tex-math>
<mml:math id="M30"><mml:mfenced close="" open="{" separators=""><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:mtext mathvariant="italic">Sn</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mtext mathvariant="italic">TP</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FN</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtext mathvariant="italic">Sp</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mtext mathvariant="italic">TN</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mtext mathvariant="italic">TN</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FP</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtext mathvariant="italic">Acc</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">TN</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">TN</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FN</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mtd>
</mml:mtr>
<mml:mtr><mml:mtd><mml:mtext mathvariant="italic">MCC</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>×</mml:mo>
<mml:mtext mathvariant="italic">TN</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>−</mml:mo>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">FP</mml:mtext>
<mml:mo>×</mml:mo>
<mml:mtext mathvariant="italic">FN</mml:mtext>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow><mml:msqrt><mml:mrow><mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FP</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FN</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">TN</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FP</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">TN</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FN</mml:mtext>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mtd>
</mml:mtr>
<mml:mtr></mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<graphic xlink:href="12859_2019_3006_Article_Equ13.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where <italic>TP</italic>
 denotes true positives, <italic>TN</italic>
 denotes true negatives, <italic>FP</italic>
 denotes false positives, <italic>FN</italic>
 denotes false negatives.</p>
</sec>
</sec>
<sec id="Sec10" sec-type="results"><title>Results</title>
<sec id="Sec11"><title>Model performance</title>
<p>To verify the proposed method, we executed the proposed model on ACPs and virulent protein datasets. Each dataset was randomly divided into three groups. The first group, which consisted of 75% of the complete dataset, was used to train the model. The second group of data, 15% of the entire dataset, was used to minimize over-fitting. The third group, 10% of the entire dataset, was used to evaluate the performance of the trained PTPD model. For ACP identification, the performance of PTPD was first measured using the test data from the main dataset, and then further tested on an alternative dataset. Furthermore, we also evaluated the performance of PTPD on two types of balanced datasets (Table <xref rid="Tab1" ref-type="table">1</xref>).
<table-wrap id="Tab1"><label>Table 1</label>
<caption><p>Performance of PTPD on the ACP dataset</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th align="left">Dataset</th>
<th align="left">Sn(%)</th>
<th align="left">Sp(%)</th>
<th align="left">Acc(%)</th>
<th align="left">MCC</th>
<th align="left">AUC</th>
</tr>
</thead>
<tbody><tr><td align="left">ACP main dataset</td>
<td align="left">99.90</td>
<td align="left">86.60</td>
<td align="left">98.50</td>
<td align="left">0.92</td>
<td align="left">0.99</td>
</tr>
<tr><td align="left">ACP alternative dataset</td>
<td align="left">96.20</td>
<td align="left">86.70</td>
<td align="left">94.80</td>
<td align="left">0.80</td>
<td align="left">0.97</td>
</tr>
<tr><td align="left">ACP balanced dataset 1</td>
<td align="left">100</td>
<td align="left">86.20</td>
<td align="left">93.10</td>
<td align="left">0.87</td>
<td align="left">0.99</td>
</tr>
<tr><td align="left">ACP balanced dataset 2</td>
<td align="left">94.20</td>
<td align="left">86.20</td>
<td align="left">90.20</td>
<td align="left">0.81</td>
<td align="left">0.97</td>
</tr>
<tr><td align="left">HC dataset</td>
<td align="left">100</td>
<td align="left">83.00</td>
<td align="left">94.00</td>
<td align="left">0.87</td>
<td align="left">0.99</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>PTPD achieved high performance scores of Sn = 94.2%, Sp = 86.2%, Acc = 90.2%, Mcc = 0.8, and AUC = 0.97, respectively. Moreover, to evaluate the generalizability or robustness of the prediction model, we executed PTPD on the independent HC dataset, as shown in Table <xref rid="Tab1" ref-type="table">1</xref>
. The AUCs of the five data sets were all higher than 0.97. Thus, PTPD offers stable performance even on unbalanced data sets (Table <xref rid="Tab1" ref-type="table">1</xref>
).</p>
<p>To evaluate the performance of PTPD, we conducted an evaluation on the test data of the SPAAN adhesins dataset. We also tested the performance of PTPD on an independent Neurotoxins dataset (Table <xref rid="Tab2" ref-type="table">2</xref>).
<table-wrap id="Tab2"><label>Table 2</label>
<caption><p>Performance of PTPD on the virulent protein dataset</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th align="left">Dataset</th>
<th align="left">Sn(%)</th>
<th align="left">Sp(%)</th>
<th align="left">Acc(%)</th>
<th align="left">MCC</th>
<th align="left">AUC</th>
</tr>
</thead>
<tbody><tr><td align="left">SPAAN adhesins dataset</td>
<td align="left">95.60</td>
<td align="left">73.3</td>
<td align="left">88.2</td>
<td align="left">0.70</td>
<td align="left">0.94</td>
</tr>
<tr><td align="left">Neurotoxins dataset</td>
<td align="left">98.00</td>
<td align="left">94.00</td>
<td align="left">96.00</td>
<td align="left">0.92</td>
<td align="left">0.93</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>The five performance metrics (Sn, Sp, Acc, MCC, and AUC) achieved by PTPD on the virulent protein dataset are higher than 95.6%, 73.3%, 88.2%, 0.7, and 0.93, respectively, which confirms the good performance of PTPD. Sp on the SPAAN adhesins dataset had a relatively lower value (Table <xref rid="Tab2" ref-type="table">2</xref>
).</p>
</sec>
<sec id="Sec12"><title>Comparison with the state-of-the-art methods</title>
<p>For verification purposes, we compared the proposed method with other state-of-the-art methods on the identification of ACPs and virulent proteins on two independent datasets.</p>
<sec id="Sec13"><title>Comparison performed on independent aCP dataset</title>
<p>To further evaluate the performance of PTPD to predict ACPs, we compared its performance with those of some state-of-the-art methods (i.e., AntiCP [<xref ref-type="bibr" rid="CR29">29</xref>
], MLACP [<xref ref-type="bibr" rid="CR12">12</xref>
], and mACPpred [<xref ref-type="bibr" rid="CR40">40</xref>
]) on an independent HC dataset (Table <xref rid="Tab3" ref-type="table">3</xref>
 and Fig. <xref rid="Fig3" ref-type="fig">3</xref>). PTPD performed equally as well as MLACP (RF) on the HC dataset. The proposed PTPD has the highest sensitivity, relatively higher AUC, ACC, and MCC, and intermediate specificity. Thus, PTPD offers relatively better generalizability on independent datasets than do the other tested state-of-the-art methods for identifying ACPs.
<fig id="Fig3"><label>Fig. 3</label>
<caption><p>Comparison of different methods on the HC dataset. <bold>a</bold>
 Sn, Sp and Acc of different methods. <bold>b</bold>
 MCC and AUC of different methods. Sn: the sensitivity; Sp: the specificity; Acc: the prediction accuracy; MCC: Matthew’s correlation coefficient; AUC: the area under the curve of the receiver-operating characteristic curve</p>
</caption>
<graphic xlink:href="12859_2019_3006_Fig3_HTML" id="MO3"></graphic>
</fig>
<table-wrap id="Tab3"><label>Table 3</label>
<caption><p>Comparison of PTPD with state-of-the-art methods on the HC dataset</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th align="left">Method</th>
<th align="left">Sn(%)</th>
<th align="left">Sp(%)</th>
<th align="left">Acc(%)</th>
<th align="left">MCC</th>
<th align="left">AUC</th>
</tr>
</thead>
<tbody><tr><td align="left">PTPD</td>
<td align="left">100</td>
<td align="left">83.00</td>
<td align="left">94.00</td>
<td align="left">0.87</td>
<td align="left">0.99</td>
</tr>
<tr><td align="left">mACPpred [<xref ref-type="bibr" rid="CR40">40</xref>
]</td>
<td align="left">97.00</td>
<td align="left">77.00</td>
<td align="left">85.00</td>
<td align="left">0.72</td>
<td align="left">0.96</td>
</tr>
<tr><td align="left">MLACP (SVM)[<xref ref-type="bibr" rid="CR12">12</xref>
]</td>
<td align="left">85.00</td>
<td align="left">91.00</td>
<td align="left">90.00</td>
<td align="left">0.73</td>
<td align="left">0.95</td>
</tr>
<tr><td align="left">MLACP (RF)[<xref ref-type="bibr" rid="CR12">12</xref>
]</td>
<td align="left">98.00</td>
<td align="left">98.00</td>
<td align="left">98.00</td>
<td align="left">0.95</td>
<td align="left">1.00</td>
</tr>
<tr><td align="left">AntiCP (Model 1)[<xref ref-type="bibr" rid="CR29">29</xref>
]</td>
<td align="left">98.00</td>
<td align="left">5.00</td>
<td align="left">40.00</td>
<td align="left">0.06</td>
<td align="left">0.75</td>
</tr>
<tr><td align="left">AntiCP (Model 2)[<xref ref-type="bibr" rid="CR29">29</xref>
]</td>
<td align="left">82.00</td>
<td align="left">90.00</td>
<td align="left">87.00</td>
<td align="left">0.72</td>
<td align="left">0.95</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec id="Sec14"><title>Comparison performed on an independent virulent protein dataset</title>
<p>We also compared the performance of PTPD with that of q-FP [<xref ref-type="bibr" rid="CR10">10</xref>
], AS and 2Gram [<xref ref-type="bibr" rid="CR41">41</xref>
], VirulentPred [<xref ref-type="bibr" rid="CR18">18</xref>
], and NTX-pred [<xref ref-type="bibr" rid="CR16">16</xref>
] on a bacterial neurotoxins dataset (Table <xref rid="Tab4" ref-type="table">4</xref>
 and Fig. <xref rid="Fig4" ref-type="fig">4</xref>).
<fig id="Fig4"><label>Fig. 4</label>
<caption><p>Comparison of different methods on the neurotoxin virulent proteins dataset. <bold>a</bold>
 Sn, Sp and Acc of different methods. <bold>b</bold>
 MCC and AUC of different methods. Sn: the sensitivity; Sp: the specificity; Acc: the prediction accuracy; MCC: Matthew’s correlation coefficient; AUC: the area under the curve of the receiver-operating characteristic curve</p>
</caption>
<graphic xlink:href="12859_2019_3006_Fig4_HTML" id="MO4"></graphic>
</fig>
<table-wrap id="Tab4"><label>Table 4</label>
<caption><p>Comparison of PTPD with state-of-the-art methods on the Neurotoxins dataset</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th align="left">Method</th>
<th align="left">Sn(%)</th>
<th align="left">Sp(%)</th>
<th align="left">Acc(%)</th>
<th align="left">MCC</th>
<th align="left">AUC</th>
</tr>
</thead>
<tbody><tr><td align="left">PTPD</td>
<td align="left">98.00</td>
<td align="left">94.00</td>
<td align="left">96.00</td>
<td align="left">0.92</td>
<td align="left">0.93</td>
</tr>
<tr><td align="left">q-FP [<xref ref-type="bibr" rid="CR10">10</xref>
]</td>
<td align="left">99.03</td>
<td align="left">98.00</td>
<td align="left">98.40</td>
<td align="left">0.94</td>
<td align="left">1</td>
</tr>
<tr><td align="left">VirulentPred [<xref ref-type="bibr" rid="CR18">18</xref>
]</td>
<td align="left">96.00</td>
<td align="left">16.00</td>
<td align="left">56.00</td>
<td align="left">-</td>
<td align="left">-</td>
</tr>
<tr><td align="left">NTX-pred(FNN) [<xref ref-type="bibr" rid="CR16">16</xref>
]</td>
<td align="left">89.65</td>
<td align="left">78.78</td>
<td align="left">84.19</td>
<td align="left">0.69</td>
<td align="left">-</td>
</tr>
<tr><td align="left">NTX-pred(RNN) [<xref ref-type="bibr" rid="CR16">16</xref>
]</td>
<td align="left">89.12</td>
<td align="left">96.35</td>
<td align="left">92.75</td>
<td align="left">0.86</td>
<td align="left">-</td>
</tr>
<tr><td align="left">NTX-pred(SVM) [<xref ref-type="bibr" rid="CR16">16</xref>
]</td>
<td align="left">96.32</td>
<td align="left">97.22</td>
<td align="left">97.72</td>
<td align="left">0.94</td>
<td align="left">-</td>
</tr>
<tr><td align="left">AS [<xref ref-type="bibr" rid="CR41">41</xref>
]</td>
<td align="left">92.00</td>
<td align="left">1.00</td>
<td align="left">96.00</td>
<td align="left">0.92</td>
<td align="left">0.99</td>
</tr>
<tr><td align="left">2Gram [<xref ref-type="bibr" rid="CR41">41</xref>
]</td>
<td align="left">1.00</td>
<td align="left">90.91</td>
<td align="left">95.00</td>
<td align="left">0.91</td>
<td align="left">1</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>Again, the overall performance of PTPD was relatively better than those of other methods. Thus, we can conclude that PTPD is able to predict potential virulent proteins with high accuracy.</p>
</sec>
</sec>
<sec id="Sec15"><title>Parameter settings</title>
<p>Because model convergence is related to the learning rate, we set the learning rate variously to 0.5, 0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, 0.0001, 0.00005, and 0.00001 for ACP training. The accuracy and loss values under the different learning rates are shown in Fig. <xref rid="Fig5" ref-type="fig">5</xref>.
<fig id="Fig5"><label>Fig. 5</label>
<caption><p>Performances under different learning rates: <bold>a</bold>
 accuracy under different learning rates; <bold>b</bold>
 loss under different learning rates</p>
</caption>
<graphic xlink:href="12859_2019_3006_Fig5_HTML" id="MO5"></graphic>
</fig>
</p>
<p>The model achieved its highest accuracy (98.5%) and the lowest loss (0.03) when the learning rate was set to 0.0001, which was subsequently selected for model training. The detailed parameter settings are shown in Table <xref rid="Tab5" ref-type="table">5</xref>.
<table-wrap id="Tab5"><label>Table 5</label>
<caption><p>Parameter setting</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th align="left">Parameters</th>
<th align="left">Value</th>
</tr>
</thead>
<tbody><tr><td align="left">Number of kernels</td>
<td align="left">150,150,150</td>
</tr>
<tr><td align="left">Filter size</td>
<td align="left">3,4,5</td>
</tr>
<tr><td align="left"><italic>k</italic>
-mer dimensions</td>
<td align="left">100</td>
</tr>
<tr><td align="left">Batch size</td>
<td align="left">100</td>
</tr>
<tr><td align="left">Epoch</td>
<td align="left">20</td>
</tr>
<tr><td align="left">Learning rate</td>
<td align="left">0.0001</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
</sec>
<sec id="Sec16" sec-type="discussion"><title>Discussion</title>
<p>The model performance presented in this study suggests that PTPD possesses good generalizability and robustness. The comparison between PTPD and other methods showed that PTPD outperformed the other tested state-of-the-art methods for independent data analysis.</p>
<p>The performance of PTPD benefits from several major factors: (1) word2vec was applied to extract representation vectors of <italic>k</italic>
-mers to consider the co-existence information of <italic>k</italic>
-mers in peptide sequences. (2) For the feature map, a convolution neural network (CNN) architecture was used to automatically extract features without domain experts. (3) Dropout and max-pooling operations were adopted to avoid over-fitting.</p>
</sec>
<sec id="Sec17" sec-type="conclusion"><title>Conclusions</title>
<p>Identifying new ACPs and virulent proteins is an extremely labour-intensive and time-consuming process. In this paper, we proposed a computational model based on deep learning that predicts therapeutic peptides with in a highly efficient manner. We then present a new deep learning-based prediction model that achieves better recognition performances compared to those of other state-of-the-art methods. We first trained a model to extract feature vectors of all <italic>k</italic>
-mers using word2vec. Next, the peptide sequences were converted into <italic>k</italic>
-mers, and each peptide sequence was represented by the vectors compiled by word2vec. The CNN then automatically extracted features without expert assistance, which decreases the reliance on domain experts for feature construction. The CNN was configured with three types of filters, and dropout and max-pooling operations were applied to avoid over-fitting. After fusing the features, ReLU activation was used to replace any negative values in the output of the CNN layer with zeros. Finally, the sigmoid function was used to classify the peptide.</p>
<p>The performance and generalizability of PTPD were verified on two independent datasets. The trained model achieved AUCs of 0.99 and 0.93, respectively, which confirmed that the proposed model can effectively identify ACPs and virulent proteins.</p>
<p>In summary, the PTPD model presented in this paper outperformed other tested methods. Nevertheless, the approach still suffers because the inability to assign values regarding which features are most important for identifying favourable bioactivity. In future studies on potential structures and feature selection methods, we may consider other available network architectures such as generative adversarial networks. Some new methods that have been successfully applied to natural language processes might also facilitate further research. Our study confirmed that PTPD is an effective means for identifying and designing novel therapeutic peptides. Our approach might be extensible to other peptide sequence-based predictions, including antihypertensive [<xref ref-type="bibr" rid="CR42">42</xref>
, <xref ref-type="bibr" rid="CR43">43</xref>
], cell-penetrating [<xref ref-type="bibr" rid="CR44">44</xref>
], and proinflammatory [<xref ref-type="bibr" rid="CR45">45</xref>
].</p>
</sec>
</body>
<back><glossary><title>Abbreviations</title>
<def-list><def-item><term>AAC</term>
<def><p>Amino acid composition</p>
</def>
</def-item>
<def-item><term>Acc</term>
<def><p>Accuracy</p>
</def>
</def-item>
<def-item><term>ACP</term>
<def><p>Anticancer peptide</p>
</def>
</def-item>
<def-item><term>AUC</term>
<def><p>The area under ROC curve</p>
</def>
</def-item>
<def-item><term>CNN</term>
<def><p>Convolution neural networks</p>
</def>
</def-item>
<def-item><term>DADP</term>
<def><p>Defence peptide</p>
</def>
</def-item>
<def-item><term>FN</term>
<def><p>False negative</p>
</def>
</def-item>
<def-item><term>FP</term>
<def><p>False positive</p>
</def>
</def-item>
<def-item><term>MCC</term>
<def><p>Matthew’s correlation coefficient</p>
</def>
</def-item>
<def-item><term>Pse-g-Gap DPC</term>
<def><p>Pseudo g-Gap dipeptide composition</p>
</def>
</def-item>
<def-item><term>PseAAC</term>
<def><p>Pseudo amino acid composition</p>
</def>
</def-item>
<def-item><term>PSI-BLAST</term>
<def><p>Position specific iterated BLAST</p>
</def>
</def-item>
<def-item><term>PSSM</term>
<def><p>Position Specific Scoring Matrices</p>
</def>
</def-item>
<def-item><term>PTPD</term>
<def><p>Prediction of therapeutic peptide by deep learning and word2Vec</p>
</def>
</def-item>
<def-item><term>RAAAC</term>
<def><p>Reduced amino acid alphabet</p>
</def>
</def-item>
<def-item><term>RAAC</term>
<def><p>Reduced amino acid composition</p>
</def>
</def-item>
<def-item><term>ReLU</term>
<def><p>Rectified linear unit</p>
</def>
</def-item>
<def-item><term>RF</term>
<def><p>Random Forest</p>
</def>
</def-item>
<def-item><term>SAP</term>
<def><p>Sequence-based model</p>
</def>
</def-item>
<def-item><term>Sn</term>
<def><p>Sensitivity</p>
</def>
</def-item>
<def-item><term>Sp</term>
<def><p>Specificity</p>
</def>
</def-item>
<def-item><term>SVM</term>
<def><p>Support vector machine</p>
</def>
</def-item>
<def-item><term>TN</term>
<def><p>True negative</p>
</def>
</def-item>
<def-item><term>TP</term>
<def><p>True positive</p>
</def>
</def-item>
</def-list>
</glossary>
<fn-group><fn><p><bold>Publisher’s Note</bold>
</p>
<p>Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
</fn>
</fn-group>
<ack><p>The authors sincerely thank Dr. Zhi-Ping Liu (School of Control Science and Engineering at Shandong University) for his valuable suggestions.</p>
</ack>
<notes notes-type="author-contribution"><title>Authors’ contributions</title>
<p>CW developed the prediction method, designed and implemented the experiments, and wrote the paper. RG conceived and led the project, analysed the results and wrote the paper. YZ evaluated the methods, suggested improvements and analysed the results. YDM drafted the manuscript. All the authors edited the manuscript, read and approved the final manuscript.</p>
</notes>
<notes notes-type="funding-information"><title>Funding</title>
<p>This research was supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. U1806202, 61533011, and 61877064). This study was also supported by the Swedish Research Council, Strategic Research Area Exodiab, Dnr 2009-1039, and the Swedish Foundation for Strategic Research Dnr IRC15-0067. Furthermore, the research was supported with a project grant from the Swedish Research Council to LG (2015-02558), a European Foundation for the Study of Diabetes (EFSD) grant, and a Hjelt Foundation grant to YDM. No funding body played a role in the design of the study, analysis and interpretation of data, or in writing the manuscript.</p>
</notes>
<notes notes-type="data-availability"><title>Availability of data and materials</title>
<p>The datasets supporting the conclusions of this article are available for ACP datasets from [<xref ref-type="bibr" rid="CR12">12</xref>
, <xref ref-type="bibr" rid="CR29">29</xref>
] and for Virulent protein from [<xref ref-type="bibr" rid="CR16">16</xref>
, <xref ref-type="bibr" rid="CR18">18</xref>
].</p>
</notes>
<notes><title>Ethics approval and consent to participate</title>
<p>Not applicable.</p>
</notes>
<notes><title>Consent for publication</title>
<p>Not applicable.</p>
</notes>
<notes notes-type="COI-statement"><title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</notes>
<ref-list id="Bib1"><title>References</title>
<ref id="CR1"><label>1</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Torre</surname>
<given-names>LA</given-names>
</name>
<name><surname>Bray</surname>
<given-names>F</given-names>
</name>
<name><surname>Siegel</surname>
<given-names>RL</given-names>
</name>
<name><surname>Ferlay</surname>
<given-names>J</given-names>
</name>
<name><surname>Lortet-Tieulent</surname>
<given-names>J</given-names>
</name>
<name><surname>Jemal</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Global cancer statistics, 2012</article-title>
<source>Ca-Cancer J Clin</source>
<year>2015</year>
<volume>65</volume>
<issue>2</issue>
<fpage>87—108</fpage>
<pub-id pub-id-type="doi">10.3322/caac.21262</pub-id>
<pub-id pub-id-type="pmid">25651787</pub-id>
</element-citation>
</ref>
<ref id="CR2"><label>2</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Al-Benna</surname>
<given-names>S</given-names>
</name>
<name><surname>Shai</surname>
<given-names>Y</given-names>
</name>
<name><surname>Jacobsen</surname>
<given-names>F</given-names>
</name>
<name><surname>Steinstraesser</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Oncolytic Activities of Host Defense Peptides</article-title>
<source>Int J Mol Sci</source>
<year>2011</year>
<volume>12</volume>
<issue>11</issue>
<fpage>8027</fpage>
<pub-id pub-id-type="doi">10.3390/ijms12118027</pub-id>
<pub-id pub-id-type="pmid">22174648</pub-id>
</element-citation>
</ref>
<ref id="CR3"><label>3</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kalyanaraman</surname>
<given-names>B</given-names>
</name>
<name><surname>Joseph</surname>
<given-names>J</given-names>
</name>
<name><surname>Kalivendi</surname>
<given-names>S</given-names>
</name>
<name><surname>Wang</surname>
<given-names>S</given-names>
</name>
<name><surname>Konorev</surname>
<given-names>E</given-names>
</name>
<name><surname>Kotamraju</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Doxorubicin-induced apoptosis: implications in cardiotoxicity</article-title>
<source>Mol Cell Biochem</source>
<year>2002</year>
<volume>234</volume>
<issue>1</issue>
<fpage>119</fpage>
<lpage>24</lpage>
<pub-id pub-id-type="doi">10.1023/A:1015976430790</pub-id>
<pub-id pub-id-type="pmid">12162424</pub-id>
</element-citation>
</ref>
<ref id="CR4"><label>4</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname>
<given-names>Y</given-names>
</name>
<name><surname>Feng</surname>
<given-names>Q</given-names>
</name>
<name><surname>Yan</surname>
<given-names>Q</given-names>
</name>
<name><surname>Hao</surname>
<given-names>X</given-names>
</name>
<name><surname>Chen</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Alpha-helical cationic anticancer peptides: a promising candidate for novel anticancer drugs</article-title>
<source>Mini-Rev Med Chem</source>
<year>2015</year>
<volume>15</volume>
<issue>1</issue>
<fpage>73</fpage>
<lpage>81</lpage>
<pub-id pub-id-type="doi">10.2174/1389557514666141107120954</pub-id>
<pub-id pub-id-type="pmid">25382016</pub-id>
</element-citation>
</ref>
<ref id="CR5"><label>5</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname>
<given-names>W</given-names>
</name>
<name><surname>Ding</surname>
<given-names>H</given-names>
</name>
<name><surname>Feng</surname>
<given-names>P</given-names>
</name>
<name><surname>Lin</surname>
<given-names>H</given-names>
</name>
<name><surname>Chou</surname>
<given-names>KC</given-names>
</name>
</person-group>
<article-title>iACP: a sequence-based tool for identifying anticancer peptides</article-title>
<source>Oncotarget</source>
<year>2016</year>
<volume>7</volume>
<issue>13</issue>
<fpage>16895</fpage>
<lpage>909</lpage>
<pub-id pub-id-type="pmid">26942877</pub-id>
</element-citation>
</ref>
<ref id="CR6"><label>6</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Li</surname>
<given-names>FM</given-names>
</name>
<name><surname>Wang</surname>
<given-names>XQ</given-names>
</name>
</person-group>
<article-title>Identifying anticancer peptides by using improved hybrid compositions</article-title>
<source>Sci Rep</source>
<year>2016</year>
<volume>6</volume>
<fpage>33910</fpage>
<pub-id pub-id-type="doi">10.1038/srep33910</pub-id>
<pub-id pub-id-type="pmid">27670968</pub-id>
</element-citation>
</ref>
<ref id="CR7"><label>7</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname>
<given-names>L</given-names>
</name>
<name><surname>Liang</surname>
<given-names>G</given-names>
</name>
<name><surname>Wang</surname>
<given-names>L</given-names>
</name>
<name><surname>Liao</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>A Novel Hybrid Sequence-Based Model for Identifying Anticancer Peptides</article-title>
<source>Genes</source>
<year>2018</year>
<volume>9</volume>
<issue>3</issue>
<fpage>158</fpage>
<pub-id pub-id-type="doi">10.3390/genes9030158</pub-id>
</element-citation>
</ref>
<ref id="CR8"><label>8</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hajisharifi</surname>
<given-names>Z</given-names>
</name>
<name><surname>Piryaiee</surname>
<given-names>M</given-names>
</name>
<name><surname>Mohammad Beigi</surname>
<given-names>M</given-names>
</name>
<name><surname>Behbahani</surname>
<given-names>M</given-names>
</name>
<name><surname>Mohabatkar</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Predicting anticancer peptides with Chou’s pseudo amino acid composition and investigating their mutagenicity via Ames test</article-title>
<source>J Theor Biol</source>
<year>2014</year>
<volume>341</volume>
<fpage>34</fpage>
<lpage>40</lpage>
<pub-id pub-id-type="doi">10.1016/j.jtbi.2013.08.037</pub-id>
<pub-id pub-id-type="pmid">24035842</pub-id>
</element-citation>
</ref>
<ref id="CR9"><label>9</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Akbar</surname>
<given-names>S</given-names>
</name>
<name><surname>Hayat</surname>
<given-names>M</given-names>
</name>
<name><surname>Iqbal</surname>
<given-names>M</given-names>
</name>
<name><surname>Jan</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>iACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space</article-title>
<source>Artif Intell Med</source>
<year>2017</year>
<volume>79</volume>
<fpage>62</fpage>
<lpage>70</lpage>
<pub-id pub-id-type="doi">10.1016/j.artmed.2017.06.008</pub-id>
<pub-id pub-id-type="pmid">28655440</pub-id>
</element-citation>
</ref>
<ref id="CR10"><label>10</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Xu</surname>
<given-names>C</given-names>
</name>
<name><surname>Ge</surname>
<given-names>L</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name><surname>Dehmer</surname>
<given-names>M</given-names>
</name>
<name><surname>Gutman</surname>
<given-names>I</given-names>
</name>
</person-group>
<article-title>Computational prediction of therapeutic peptides based on graph index</article-title>
<source>J Biomed Inf</source>
<year>2017</year>
<volume>75</volume>
<fpage>63</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1016/j.jbi.2017.09.011</pub-id>
</element-citation>
</ref>
<ref id="CR11"><label>11</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Manavalan</surname>
<given-names>B</given-names>
</name>
<name><surname>Basith</surname>
<given-names>S</given-names>
</name>
<name><surname>Shin</surname>
<given-names>TH</given-names>
</name>
<name><surname>Choi</surname>
<given-names>S</given-names>
</name>
<name><surname>Kim</surname>
<given-names>MO</given-names>
</name>
<name><surname>Lee</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>MLACP: machine-learning-based prediction of anticancer peptides</article-title>
<source>Oncotarget</source>
<year>2017</year>
<volume>8</volume>
<issue>44</issue>
<fpage>77121</fpage>
<lpage>36</lpage>
<pub-id pub-id-type="doi">10.18632/oncotarget.20365</pub-id>
<pub-id pub-id-type="pmid">29100375</pub-id>
</element-citation>
</ref>
<ref id="CR12"><label>12</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Manavalan</surname>
<given-names>B</given-names>
</name>
<name><surname>Basith</surname>
<given-names>S</given-names>
</name>
<name><surname>Shin</surname>
<given-names>TH</given-names>
</name>
<name><surname>Choi</surname>
<given-names>S</given-names>
</name>
<name><surname>Kim</surname>
<given-names>MO</given-names>
</name>
<name><surname>Lee</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>MLACP: machine-learning-based prediction of anticancer peptides</article-title>
<source>Oncotarget</source>
<year>2017</year>
<volume>8</volume>
<issue>44</issue>
<fpage>77121</fpage>
<pub-id pub-id-type="doi">10.18632/oncotarget.20365</pub-id>
<pub-id pub-id-type="pmid">29100375</pub-id>
</element-citation>
</ref>
<ref id="CR13"><label>13</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wei</surname>
<given-names>L</given-names>
</name>
<name><surname>Zhou</surname>
<given-names>C</given-names>
</name>
<name><surname>Chen</surname>
<given-names>H</given-names>
</name>
<name><surname>Song</surname>
<given-names>J</given-names>
</name>
<name><surname>Su</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides</article-title>
<source>Bioinformatics</source>
<year>2018</year>
<volume>34</volume>
<issue>23</issue>
<fpage>4007</fpage>
<lpage>16</lpage>
<pub-id pub-id-type="pmid">29868903</pub-id>
</element-citation>
</ref>
<ref id="CR14"><label>14</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name><surname>Gish</surname>
<given-names>W</given-names>
</name>
<name><surname>Miller</surname>
<given-names>W</given-names>
</name>
<name><surname>Myers</surname>
<given-names>EW</given-names>
</name>
<name><surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<article-title>Basic local alignment search tool</article-title>
<source>J Mol Biol</source>
<year>1990</year>
<volume>215</volume>
<issue>3</issue>
<fpage>403</fpage>
<lpage>10</lpage>
<pub-id pub-id-type="doi">10.1016/S0022-2836(05)80360-2</pub-id>
<pub-id pub-id-type="pmid">2231712</pub-id>
</element-citation>
</ref>
<ref id="CR15"><label>15</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name><surname>Madden</surname>
<given-names>TL</given-names>
</name>
<name><surname>Schäffer</surname>
<given-names>AA</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>J</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name><surname>Miller</surname>
<given-names>W</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</article-title>
<source>Nucleic Acids Res</source>
<year>1997</year>
<volume>25</volume>
<issue>17</issue>
<fpage>3389</fpage>
<lpage>402</lpage>
<pub-id pub-id-type="doi">10.1093/nar/25.17.3389</pub-id>
<pub-id pub-id-type="pmid">9254694</pub-id>
</element-citation>
</ref>
<ref id="CR16"><label>16</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Saha</surname>
<given-names>S</given-names>
</name>
<name><surname>Raghava</surname>
<given-names>GPS</given-names>
</name>
</person-group>
<article-title>Prediction of neurotoxins based on their function and source</article-title>
<source>In Silico Biol</source>
<year>2007</year>
<volume>7</volume>
<issue>4-5</issue>
<fpage>369</fpage>
<lpage>87</lpage>
<pub-id pub-id-type="pmid">18391230</pub-id>
</element-citation>
</ref>
<ref id="CR17"><label>17</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nanni</surname>
<given-names>L</given-names>
</name>
<name><surname>Lumini</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>An ensemble of support vector machines for predicting virulent proteins</article-title>
<source>Expert Syst Appl</source>
<year>2009</year>
<volume>36</volume>
<issue>4</issue>
<fpage>7458</fpage>
<lpage>62</lpage>
<pub-id pub-id-type="doi">10.1016/j.eswa.2008.09.036</pub-id>
</element-citation>
</ref>
<ref id="CR18"><label>18</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Garg</surname>
<given-names>A</given-names>
</name>
<name><surname>Gupta</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens</article-title>
<source>BMC Bioinf</source>
<year>2008</year>
<volume>9</volume>
<issue>1</issue>
<fpage>62</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-9-62</pub-id>
</element-citation>
</ref>
<ref id="CR19"><label>19</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nanni</surname>
<given-names>L</given-names>
</name>
<name><surname>Lumini</surname>
<given-names>A</given-names>
</name>
<name><surname>Gupta</surname>
<given-names>D</given-names>
</name>
<name><surname>Garg</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Identifying Bacterial Virulent Proteins by Fusing a Set of Classifiers Based on Variants of Chou’s Pseudo Amino Acid Composition and on Evolutionary Information</article-title>
<source>IEEE/ACM Trans Comput Biol Bioinf</source>
<year>2012</year>
<volume>9</volume>
<issue>2</issue>
<fpage>467</fpage>
<lpage>75</lpage>
<pub-id pub-id-type="doi">10.1109/TCBB.2011.117</pub-id>
</element-citation>
</ref>
<ref id="CR20"><label>20</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Krizhevsky</surname>
<given-names>A</given-names>
</name>
<name><surname>Sutskever</surname>
<given-names>I</given-names>
</name>
<name><surname>Hinton</surname>
<given-names>GE</given-names>
</name>
</person-group>
<person-group person-group-type="editor"><name><surname>Pereira</surname>
<given-names>F</given-names>
</name>
<name><surname>Burges</surname>
<given-names>CJC</given-names>
</name>
<name><surname>Bottou</surname>
<given-names>L</given-names>
</name>
<name><surname>Weinberger</surname>
<given-names>KQ</given-names>
</name>
</person-group>
<article-title>ImageNet Classification with Deep Convolutional Neural Networks</article-title>
<source>Advances in Neural Information Processing Systems 25</source>
<year>2012</year>
<publisher-loc>Red Hook</publisher-loc>
<publisher-name>Curran Associates, Inc.</publisher-name>
</element-citation>
</ref>
<ref id="CR21"><label>21</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Szegedy</surname>
<given-names>C</given-names>
</name>
<name><surname>Liu</surname>
<given-names>W</given-names>
</name>
<name><surname>Jia</surname>
<given-names>Y</given-names>
</name>
<name><surname>Sermanet</surname>
<given-names>P</given-names>
</name>
<name><surname>Reed</surname>
<given-names>S</given-names>
</name>
<name><surname>Anguelov</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Going deeper with convolutions</article-title>
<source>2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
<year>2015</year>
<publisher-loc>Boston</publisher-loc>
<publisher-name>IEEE</publisher-name>
</element-citation>
</ref>
<ref id="CR22"><label>22</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>He</surname>
<given-names>K</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>X</given-names>
</name>
<name><surname>Ren</surname>
<given-names>S</given-names>
</name>
<name><surname>Sun</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Deep Residual Learning for Image Recognition</article-title>
<source>The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
<year>2016</year>
<publisher-loc>Las Vegas</publisher-loc>
<publisher-name>IEEE</publisher-name>
</element-citation>
</ref>
<ref id="CR23"><label>23</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Girshick</surname>
<given-names>R</given-names>
</name>
<name><surname>Donahue</surname>
<given-names>J</given-names>
</name>
<name><surname>Darrell</surname>
<given-names>T</given-names>
</name>
<name><surname>Malik</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Region-based convolutional networks for accurate object detection and segmentation</article-title>
<source>IEEE T Pattern Anal</source>
<year>2016</year>
<volume>38</volume>
<issue>1</issue>
<fpage>142</fpage>
<lpage>58</lpage>
<pub-id pub-id-type="doi">10.1109/TPAMI.2015.2437384</pub-id>
</element-citation>
</ref>
<ref id="CR24"><label>24</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ren</surname>
<given-names>S</given-names>
</name>
<name><surname>He</surname>
<given-names>K</given-names>
</name>
<name><surname>Girshick</surname>
<given-names>R</given-names>
</name>
<name><surname>Sun</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Faster R-CNN: towards real-time object detection with region proposal networks</article-title>
<source>IEEE T Pattern Anal</source>
<year>2017</year>
<volume>6</volume>
<fpage>1137</fpage>
<lpage>49</lpage>
<pub-id pub-id-type="doi">10.1109/TPAMI.2016.2577031</pub-id>
</element-citation>
</ref>
<ref id="CR25"><label>25</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname>
<given-names>P</given-names>
</name>
<name><surname>Wang</surname>
<given-names>H</given-names>
</name>
<name><surname>Kwong</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition</article-title>
<source>Neurocomputing</source>
<year>2017</year>
<volume>225</volume>
<fpage>188</fpage>
<lpage>97</lpage>
<pub-id pub-id-type="doi">10.1016/j.neucom.2016.11.023</pub-id>
</element-citation>
</ref>
<ref id="CR26"><label>26</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Zhong</surname>
<given-names>Z</given-names>
</name>
<name><surname>Jin</surname>
<given-names>L</given-names>
</name>
<name><surname>Xie</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>High performance offline handwritten chinese character recognition using googlenet and directional feature maps</article-title>
<source>Document Analysis and Recognition (ICDAR), 2015 13th International Conference on</source>
<year>2015</year>
<publisher-loc>Tunis</publisher-loc>
<publisher-name>IEEE</publisher-name>
</element-citation>
</ref>
<ref id="CR27"><label>27</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name><surname>Roller</surname>
<given-names>S</given-names>
</name>
<name><surname>Wallace</surname>
<given-names>BC</given-names>
</name>
</person-group>
<article-title>MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification</article-title>
<source>Proceedings of NAACL-HLT</source>
<year>2016</year>
<publisher-loc>San Diego</publisher-loc>
<publisher-name>Association for Computational Linguistics</publisher-name>
</element-citation>
</ref>
<ref id="CR28"><label>28</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Min</surname>
<given-names>X</given-names>
</name>
<name><surname>Zeng</surname>
<given-names>W</given-names>
</name>
<name><surname>Chen</surname>
<given-names>N</given-names>
</name>
<name><surname>Chen</surname>
<given-names>T</given-names>
</name>
<name><surname>Jiang</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding</article-title>
<source>Bioinformatics</source>
<year>2017</year>
<volume>33</volume>
<issue>14</issue>
<fpage>i92</fpage>
<lpage>i101</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btx234</pub-id>
<pub-id pub-id-type="pmid">28881969</pub-id>
</element-citation>
</ref>
<ref id="CR29"><label>29</label>
<mixed-citation publication-type="other">Tyagi A, Kapoor P, Kumar R, Chaudhary K, Gautam A, Raghava G. In silico models for designing and discovering novel anticancer peptides. Sci Rep. 3; 2013:2984.</mixed-citation>
</ref>
<ref id="CR30"><label>30</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Novković</surname>
<given-names>M</given-names>
</name>
<name><surname>Simunić</surname>
<given-names>J</given-names>
</name>
<name><surname>Bojović</surname>
<given-names>V</given-names>
</name>
<name><surname>Tossi</surname>
<given-names>A</given-names>
</name>
<name><surname>Juretić</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>DADP: the database of anuran defense peptides</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<issue>10</issue>
<fpage>1406</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bts141</pub-id>
<pub-id pub-id-type="pmid">22467909</pub-id>
</element-citation>
</ref>
<ref id="CR31"><label>31</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hajisharifi</surname>
<given-names>Z</given-names>
</name>
<name><surname>Piryaiee</surname>
<given-names>M</given-names>
</name>
<name><surname>Beigi</surname>
<given-names>MM</given-names>
</name>
<name><surname>Behbahani</surname>
<given-names>M</given-names>
</name>
<name><surname>Mohabatkar</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Predicting anticancer peptides with Chouś pseudo amino acid composition and investigating their mutagenicity via Ames test</article-title>
<source>J Theor Biol</source>
<year>2014</year>
<volume>341</volume>
<fpage>34</fpage>
<lpage>40</lpage>
<pub-id pub-id-type="doi">10.1016/j.jtbi.2013.08.037</pub-id>
<pub-id pub-id-type="pmid">24035842</pub-id>
</element-citation>
</ref>
<ref id="CR32"><label>32</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname>
<given-names>W</given-names>
</name>
<name><surname>Ding</surname>
<given-names>H</given-names>
</name>
<name><surname>Feng</surname>
<given-names>P</given-names>
</name>
<name><surname>Lin</surname>
<given-names>H</given-names>
</name>
<name><surname>Chou</surname>
<given-names>KC</given-names>
</name>
</person-group>
<article-title>iACP: a sequence-based tool for identifying anticancer peptides</article-title>
<source>Oncotarget</source>
<year>2016</year>
<volume>7</volume>
<issue>13</issue>
<fpage>16895</fpage>
<pub-id pub-id-type="pmid">26942877</pub-id>
</element-citation>
</ref>
<ref id="CR33"><label>33</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Compeau</surname>
<given-names>PEC</given-names>
</name>
<name><surname>Pevzner</surname>
<given-names>PA</given-names>
</name>
<name><surname>Tesler</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>How to apply de Bruijn graphs to genome assembly</article-title>
<source>Nat Biotechnol</source>
<year>2011</year>
<volume>29</volume>
<fpage>987</fpage>
<pub-id pub-id-type="doi">10.1038/nbt.2023</pub-id>
<pub-id pub-id-type="pmid">22068540</pub-id>
</element-citation>
</ref>
<ref id="CR34"><label>34</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Aggarwala</surname>
<given-names>V</given-names>
</name>
<name><surname>Voight</surname>
<given-names>BF</given-names>
</name>
</person-group>
<article-title>An expanded sequence context model broadly explains variability in polymorphism levels across the human genome</article-title>
<source>Nat Genet</source>
<year>2016</year>
<volume>48</volume>
<issue>4</issue>
<fpage>349</fpage>
<lpage>55</lpage>
<pub-id pub-id-type="doi">10.1038/ng.3511</pub-id>
<pub-id pub-id-type="pmid">26878723</pub-id>
</element-citation>
</ref>
<ref id="CR35"><label>35</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Hinton</surname>
<given-names>GE</given-names>
</name>
</person-group>
<person-group person-group-type="editor"><name><surname>Morris</surname>
<given-names>RGM</given-names>
</name>
</person-group>
<article-title>Learning distributed representations of concepts</article-title>
<source>Parallel distributed processing: Implications for psychology and neurobiology</source>
<year>1989</year>
<publisher-loc>New York</publisher-loc>
<publisher-name>Oxford University Press</publisher-name>
</element-citation>
</ref>
<ref id="CR36"><label>36</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname>
<given-names>B</given-names>
</name>
<name><surname>Tang</surname>
<given-names>B</given-names>
</name>
<name><surname>Chen</surname>
<given-names>Q</given-names>
</name>
<name><surname>Kang</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>A novel word embedding learning model using the dissociation between nouns and verbs</article-title>
<source>Neurocomputing</source>
<year>2016</year>
<volume>171</volume>
<fpage>1108</fpage>
<lpage>17</lpage>
<pub-id pub-id-type="doi">10.1016/j.neucom.2015.07.046</pub-id>
</element-citation>
</ref>
<ref id="CR37"><label>37</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Mikolov</surname>
<given-names>T</given-names>
</name>
<name><surname>Sutskever</surname>
<given-names>I</given-names>
</name>
<name><surname>Chen</surname>
<given-names>K</given-names>
</name>
<name><surname>Corrado</surname>
<given-names>G</given-names>
</name>
<name><surname>Dean</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Distributed Representations of Words and Phrases and Their Compositionality</article-title>
<source>Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. NIPS’13</source>
<year>2013</year>
<publisher-loc>USA</publisher-loc>
<publisher-name>Curran Associates Inc.</publisher-name>
</element-citation>
</ref>
<ref id="CR38"><label>38</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname>
<given-names>D</given-names>
</name>
<name><surname>Xu</surname>
<given-names>H</given-names>
</name>
<name><surname>Su</surname>
<given-names>Z</given-names>
</name>
<name><surname>Xu</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Chinese comments sentiment classification based on word2vec and SVMperf</article-title>
<source>Expert Syst Appl</source>
<year>2015</year>
<volume>42</volume>
<issue>4</issue>
<fpage>1857</fpage>
<lpage>63</lpage>
<pub-id pub-id-type="doi">10.1016/j.eswa.2014.09.011</pub-id>
</element-citation>
</ref>
<ref id="CR39"><label>39</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Nair</surname>
<given-names>V</given-names>
</name>
<name><surname>Hinton</surname>
<given-names>GE</given-names>
</name>
</person-group>
<article-title>Rectified Linear Units Improve Restricted Boltzmann Machines</article-title>
<source>Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10</source>
<year>2010</year>
<publisher-loc>Omnipress</publisher-loc>
<publisher-name>USA</publisher-name>
</element-citation>
</ref>
<ref id="CR40"><label>40</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Boopathi</surname>
<given-names>V</given-names>
</name>
<name><surname>Subramaniyam</surname>
<given-names>S</given-names>
</name>
<name><surname>Malik</surname>
<given-names>A</given-names>
</name>
<name><surname>Lee</surname>
<given-names>G</given-names>
</name>
<name><surname>Manavalan</surname>
<given-names>B</given-names>
</name>
<name><surname>Yang</surname>
<given-names>DC</given-names>
</name>
</person-group>
<article-title>mACPpred: A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides</article-title>
<source>International journal of molecular sciences</source>
<year>2019</year>
<volume>20</volume>
<issue>8</issue>
<fpage>1964</fpage>
<pub-id pub-id-type="doi">10.3390/ijms20081964</pub-id>
</element-citation>
</ref>
<ref id="CR41"><label>41</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nanni</surname>
<given-names>L</given-names>
</name>
<name><surname>Lumini</surname>
<given-names>A</given-names>
</name>
<name><surname>Brahnam</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>An Empirical Study of Different Approaches for Protein Classification</article-title>
<source>Sci World J</source>
<year>2014</year>
<volume>2014</volume>
<fpage>17</fpage>
<pub-id pub-id-type="doi">10.1155/2014/236717</pub-id>
</element-citation>
</ref>
<ref id="CR42"><label>42</label>
<mixed-citation publication-type="other">Manavalan B, Basith S, Shin TH, Wei L, Lee G. mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics. 2018;12.</mixed-citation>
</ref>
<ref id="CR43"><label>43</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Win</surname>
<given-names>TS</given-names>
</name>
<name><surname>Schaduangrat</surname>
<given-names>N</given-names>
</name>
<name><surname>Prachayasittikul</surname>
<given-names>V</given-names>
</name>
<name><surname>Nantasenamat</surname>
<given-names>C</given-names>
</name>
<name><surname>Shoombuatong</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>PAAP: a web server for predicting antihypertensive activity of peptides</article-title>
<source>Future Med Chem</source>
<year>2018</year>
<volume>10</volume>
<issue>15</issue>
<fpage>1749</fpage>
<lpage>67</lpage>
<pub-id pub-id-type="doi">10.4155/fmc-2017-0300</pub-id>
<pub-id pub-id-type="pmid">30039980</pub-id>
</element-citation>
</ref>
<ref id="CR44"><label>44</label>
<mixed-citation publication-type="other">Su R, Hu J, Zou Q, Manavalan B, Wei L. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief Bioinform. 2019.</mixed-citation>
</ref>
<ref id="CR45"><label>45</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Manavalan</surname>
<given-names>B</given-names>
</name>
<name><surname>Shin</surname>
<given-names>TH</given-names>
</name>
<name><surname>Kim</surname>
<given-names>MO</given-names>
</name>
<name><surname>Lee</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions</article-title>
<source>Front Immunol</source>
<year>2018</year>
<volume>9</volume>
<fpage>1783</fpage>
<pub-id pub-id-type="doi">10.3389/fimmu.2018.01783</pub-id>
<pub-id pub-id-type="pmid">30108593</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000284 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000284 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:6728961
   |texte=   PTPD: predicting therapeutic peptides by deep learning and word2vec
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:31492094" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

PTPD: predicting therapeutic peptides by deep learning and word2vec

PTPD: predicting therapeutic peptides by deep learning and word2vec

Source :

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki