Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Predicting proteolytic sites in extracellular proteins: only halfway there

Identifieur interne : 000848 ( Istex/Corpus ); précédent : 000847; suivant : 000849

Predicting proteolytic sites in extracellular proteins: only halfway there

Auteurs : Yossef Kliger ; Eyal Gofer ; Assaf Wool ; Amir Toporik ; Avihay Apatoff ; Moshe Olshansky

Source :

RBID : ISTEX:B0C4B4C1EC355D0EB86F236BEA4E417647565B77

Abstract

Motivation: Many secretory proteins are synthesized as inactive precursors that must undergo post-translational proteolysis in order to mature and become active. In the current study, we address the challenge of sequence-based discovery of proteolytic sites in secreted proteins using machine learning. Results: The results revealed that only half of the extracellular proteolytic sites are currently annotated, leaving over 3600 unannotated ones. Furthermore, we have found that only 6% of the unannotated sites are similar to known proteolytic sites, whereas the remaining 94% do not share significant similarity with any annotated proteolytic site. The computational challenges in these two cases are very different. While the precision in detecting the former group is close to perfect, only a mere 22% of the latter group were detected with a precision of 80%. The applicability of the classifier is demonstrated through members of the FGF family, in which we verified the conservation of physiologically-relevant proteolytic sites in homologous proteins. Contact: kliger@compugen.co.il; yossef.kliger@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.

Url:
DOI: 10.1093/bioinformatics/btn084

Links to Exploration step

ISTEX:B0C4B4C1EC355D0EB86F236BEA4E417647565B77

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Predicting proteolytic sites in extracellular proteins: only halfway there</title>
<author>
<name sortKey="Kliger, Yossef" sort="Kliger, Yossef" uniqKey="Kliger Y" first="Yossef" last="Kliger">Yossef Kliger</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>*To whom correspondence should be addressed.</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Gofer, Eyal" sort="Gofer, Eyal" uniqKey="Gofer E" first="Eyal" last="Gofer">Eyal Gofer</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Wool, Assaf" sort="Wool, Assaf" uniqKey="Wool A" first="Assaf" last="Wool">Assaf Wool</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Toporik, Amir" sort="Toporik, Amir" uniqKey="Toporik A" first="Amir" last="Toporik">Amir Toporik</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Apatoff, Avihay" sort="Apatoff, Avihay" uniqKey="Apatoff A" first="Avihay" last="Apatoff">Avihay Apatoff</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Olshansky, Moshe" sort="Olshansky, Moshe" uniqKey="Olshansky M" first="Moshe" last="Olshansky">Moshe Olshansky</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:B0C4B4C1EC355D0EB86F236BEA4E417647565B77</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1093/bioinformatics/btn084</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HXZ-RQ5WHZPQ-4/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000848</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000848</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main">Predicting proteolytic sites in extracellular proteins: only halfway there</title>
<author>
<name sortKey="Kliger, Yossef" sort="Kliger, Yossef" uniqKey="Kliger Y" first="Yossef" last="Kliger">Yossef Kliger</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>*To whom correspondence should be addressed.</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Gofer, Eyal" sort="Gofer, Eyal" uniqKey="Gofer E" first="Eyal" last="Gofer">Eyal Gofer</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Wool, Assaf" sort="Wool, Assaf" uniqKey="Wool A" first="Assaf" last="Wool">Assaf Wool</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Toporik, Amir" sort="Toporik, Amir" uniqKey="Toporik A" first="Amir" last="Toporik">Amir Toporik</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Apatoff, Avihay" sort="Apatoff, Avihay" uniqKey="Apatoff A" first="Avihay" last="Apatoff">Avihay Apatoff</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Olshansky, Moshe" sort="Olshansky, Moshe" uniqKey="Olshansky M" first="Moshe" last="Olshansky">Moshe Olshansky</name>
<affiliation>
<mods:affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j" type="main">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1460-2059</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published">2008</date>
<date type="e-published">2008</date>
<biblScope unit="vol">24</biblScope>
<biblScope unit="issue">8</biblScope>
<biblScope unit="page" from="1049">1049</biblScope>
<biblScope unit="page" to="1055">1055</biblScope>
</imprint>
<idno type="ISSN">1367-4803</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1367-4803</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract">Motivation: Many secretory proteins are synthesized as inactive precursors that must undergo post-translational proteolysis in order to mature and become active. In the current study, we address the challenge of sequence-based discovery of proteolytic sites in secreted proteins using machine learning. Results: The results revealed that only half of the extracellular proteolytic sites are currently annotated, leaving over 3600 unannotated ones. Furthermore, we have found that only 6% of the unannotated sites are similar to known proteolytic sites, whereas the remaining 94% do not share significant similarity with any annotated proteolytic site. The computational challenges in these two cases are very different. While the precision in detecting the former group is close to perfect, only a mere 22% of the latter group were detected with a precision of 80%. The applicability of the classifier is demonstrated through members of the FGF family, in which we verified the conservation of physiologically-relevant proteolytic sites in homologous proteins. Contact: kliger@compugen.co.il; yossef.kliger@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.</div>
</front>
</TEI>
<istex>
<corpusName>oup</corpusName>
<keywords>
<teeft>
<json:string>proteolytic site</json:string>
<json:string>proteolytic</json:string>
<json:string>classifier</json:string>
<json:string>annotation</json:string>
<json:string>furin</json:string>
<json:string>datum</json:string>
<json:string>peptide</json:string>
<json:string>precursor</json:string>
<json:string>signal peptide</json:string>
<json:string>furin correction factor</json:string>
<json:string>secretory</json:string>
<json:string>pathway</json:string>
<json:string>convertase</json:string>
<json:string>extracellular</json:string>
<json:string>secretome</json:string>
<json:string>cleavage</json:string>
<json:string>secretory pathway</json:string>
<json:string>precursor protein</json:string>
<json:string>annotation line</json:string>
<json:string>performance evaluation</json:string>
<json:string>classifier specialized</json:string>
<json:string>proteolysis</json:string>
<json:string>protease</json:string>
<json:string>family member</json:string>
<json:string>convertase family</json:string>
<json:string>proteolytic processing</json:string>
<json:string>strong support</json:string>
<json:string>cleavage site</json:string>
<json:string>support vector machine</json:string>
<json:string>score output</json:string>
<json:string>symmetrical window</json:string>
<json:string>random forest</json:string>
<json:string>potential site</json:string>
<json:string>real precision</json:string>
<json:string>topology annotation</json:string>
<json:string>precision value</json:string>
<json:string>fibroblast growth factor</json:string>
<json:string>adhr patient</json:string>
<json:string>first residue</json:string>
<json:string>signal peptidase</json:string>
<json:string>basic residue</json:string>
<json:string>datum extraction</json:string>
<json:string>word template</json:string>
<json:string>last residue</json:string>
<json:string>envelope glycoprotein</json:string>
<json:string>proteolytic enzyme</json:string>
<json:string>different level</json:string>
<json:string>different approach</json:string>
<json:string>peptide hormone</json:string>
<json:string>classifier construction</json:string>
<json:string>separate performance evaluation</json:string>
<json:string>identical residue</json:string>
<json:string>computational approach</json:string>
<json:string>human plasma proteome</json:string>
<json:string>swissprot entry</json:string>
<json:string>time larger</json:string>
<json:string>protein sequence</json:string>
<json:string>serine protease</json:string>
<json:string>mislabeled instance</json:string>
<json:string>negative datum</json:string>
<json:string>first position</json:string>
<json:string>unknown proteolytic site</json:string>
<json:string>correction procedure</json:string>
<json:string>transmembrane domain</json:string>
<json:string>ordinary precision</json:string>
<json:string>furin site</json:string>
<json:string>negative set</json:string>
<json:string>proteolytic site prediction</json:string>
<json:string>ambg site</json:string>
<json:string>eukaryotic secretome</json:string>
<json:string>annotation record</json:string>
<json:string>prediction method</json:string>
<json:string>hypophosphatemic rickets</json:string>
<json:string>cytoplasmic domain</json:string>
<json:string>unannotated site</json:string>
<json:string>light grey</json:string>
<json:string>amino acid</json:string>
<json:string>biological activity</json:string>
<json:string>remote homologs</json:string>
<json:string>active peptide</json:string>
<json:string>proprotein convertases</json:string>
<json:string>human immunodeficiency virus type</json:string>
<json:string>potential proteolytic site</json:string>
</teeft>
</keywords>
<author>
<json:item>
<name>Yossef Kliger</name>
<affiliations>
<json:string>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</json:string>
<json:string>*To whom correspondence should be addressed.</json:string>
</affiliations>
</json:item>
<json:item>
<name>Eyal Gofer</name>
<affiliations>
<json:string>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</json:string>
</affiliations>
</json:item>
<json:item>
<name>Assaf Wool</name>
<affiliations>
<json:string>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</json:string>
</affiliations>
</json:item>
<json:item>
<name>Amir Toporik</name>
<affiliations>
<json:string>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</json:string>
</affiliations>
</json:item>
<json:item>
<name>Avihay Apatoff</name>
<affiliations>
<json:string>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</json:string>
</affiliations>
</json:item>
<json:item>
<name>Moshe Olshansky</name>
<affiliations>
<json:string>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</json:string>
</affiliations>
</json:item>
</author>
<articleId>
<json:string>btn084</json:string>
</articleId>
<arkIstex>ark:/67375/HXZ-RQ5WHZPQ-4</arkIstex>
<language>
<json:string>unknown</json:string>
</language>
<originalGenre>
<json:string>research-article</json:string>
</originalGenre>
<abstract>Motivation: Many secretory proteins are synthesized as inactive precursors that must undergo post-translational proteolysis in order to mature and become active. In the current study, we address the challenge of sequence-based discovery of proteolytic sites in secreted proteins using machine learning. Results: The results revealed that only half of the extracellular proteolytic sites are currently annotated, leaving over 3600 unannotated ones. Furthermore, we have found that only 6% of the unannotated sites are similar to known proteolytic sites, whereas the remaining 94% do not share significant similarity with any annotated proteolytic site. The computational challenges in these two cases are very different. While the precision in detecting the former group is close to perfect, only a mere 22% of the latter group were detected with a precision of 80%. The applicability of the classifier is demonstrated through members of the FGF family, in which we verified the conservation of physiologically-relevant proteolytic sites in homologous proteins. Contact: kliger@compugen.co.il; yossef.kliger@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.</abstract>
<qualityIndicators>
<score>7.671</score>
<pdfWordCount>5625</pdfWordCount>
<pdfCharCount>35439</pdfCharCount>
<pdfVersion>1.4</pdfVersion>
<pdfPageCount>7</pdfPageCount>
<pdfPageSize>612 x 791 pts</pdfPageSize>
<pdfWordsPerPage>804</pdfWordsPerPage>
<pdfText>true</pdfText>
<refBibsNative>true</refBibsNative>
<abstractWordCount>167</abstractWordCount>
<abstractCharCount>1202</abstractCharCount>
<keywordCount>0</keywordCount>
</qualityIndicators>
<title>Predicting proteolytic sites in extracellular proteins: only halfway there</title>
<genre>
<json:string>research-article</json:string>
</genre>
<host>
<title>Bioinformatics</title>
<language>
<json:string>unknown</json:string>
</language>
<issn>
<json:string>1367-4803</json:string>
</issn>
<eissn>
<json:string>1460-2059</json:string>
</eissn>
<publisherId>
<json:string>bioinformatics</json:string>
</publisherId>
<volume>24</volume>
<issue>8</issue>
<pages>
<first>1049</first>
<last>1055</last>
</pages>
<genre>
<json:string>journal</json:string>
</genre>
<subject>
<json:item>
<value>ORIGINAL PAPERS</value>
</json:item>
<json:item>
<value>SEQUENCE ANALYSIS</value>
</json:item>
</subject>
</host>
<ark>
<json:string>ark:/67375/HXZ-RQ5WHZPQ-4</json:string>
</ark>
<categories>
<wos>
<json:string>1 - science</json:string>
<json:string>2 - mathematical & computational biology</json:string>
<json:string>2 - biotechnology & applied microbiology</json:string>
<json:string>2 - biochemical research methods</json:string>
</wos>
<scienceMetrix>
<json:string>1 - applied sciences</json:string>
<json:string>2 - enabling & strategic technologies</json:string>
<json:string>3 - bioinformatics</json:string>
</scienceMetrix>
<scopus>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Mathematics</json:string>
<json:string>3 - Computational Mathematics</json:string>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Computer Science</json:string>
<json:string>3 - Computational Theory and Mathematics</json:string>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Computer Science</json:string>
<json:string>3 - Computer Science Applications</json:string>
<json:string>1 - Life Sciences</json:string>
<json:string>2 - Biochemistry, Genetics and Molecular Biology</json:string>
<json:string>3 - Molecular Biology</json:string>
<json:string>1 - Life Sciences</json:string>
<json:string>2 - Biochemistry, Genetics and Molecular Biology</json:string>
<json:string>3 - Biochemistry</json:string>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Mathematics</json:string>
<json:string>3 - Statistics and Probability</json:string>
</scopus>
<inist>
<json:string>1 - sciences appliquees, technologies et medecines</json:string>
<json:string>2 - sciences biologiques et medicales</json:string>
<json:string>3 - sciences biologiques fondamentales et appliquees. psychologie</json:string>
</inist>
</categories>
<publicationDate>2008</publicationDate>
<copyrightDate>2008</copyrightDate>
<doi>
<json:string>10.1093/bioinformatics/btn084</json:string>
</doi>
<id>B0C4B4C1EC355D0EB86F236BEA4E417647565B77</id>
<score>1</score>
<fulltext>
<json:item>
<extension>pdf</extension>
<original>true</original>
<mimetype>application/pdf</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-RQ5WHZPQ-4/fulltext.pdf</uri>
</json:item>
<json:item>
<extension>zip</extension>
<original>false</original>
<mimetype>application/zip</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-RQ5WHZPQ-4/bundle.zip</uri>
</json:item>
<json:item>
<extension>txt</extension>
<original>false</original>
<mimetype>text/plain</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-RQ5WHZPQ-4/fulltext.txt</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/ark:/67375/HXZ-RQ5WHZPQ-4/fulltext.tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main">Predicting proteolytic sites in extracellular proteins: only halfway there</title>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>Oxford University Press</publisher>
<availability>
<licence>© 2008 The Author(s)</licence>
<p>This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
</availability>
<date type="e-published">2008</date>
<date type="Copyright" when="2008">2008</date>
</publicationStmt>
<notesStmt>
<note type="content-type" source="research-article" scheme="https://content-type.data.istex.fr/ark:/67375/XTP-1JC4F85T-7">research-article</note>
<note type="publication-type" scheme="https://publication-type.data.istex.fr/ark:/67375/JMC-0GLKJH51-B">journal</note>
</notesStmt>
<sourceDesc>
<biblStruct type="article">
<analytic>
<title level="a" type="main">Predicting proteolytic sites in extracellular proteins: only halfway there</title>
<author xml:id="author-0000" role="corresp">
<persName>
<surname>Kliger</surname>
<forename type="first">Yossef</forename>
</persName>
<affiliation>1Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and 2The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
<note place="foot" n="FN1">
<p>†The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.</p>
</note>
<affiliation role="corresp">*To whom correspondence should be addressed.</affiliation>
</author>
<author xml:id="author-0001">
<persName>
<surname>Gofer</surname>
<forename type="first">Eyal</forename>
</persName>
<affiliation>1Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and 2The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
<note place="foot" n="FN1">
<p>†The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.</p>
</note>
</author>
<author xml:id="author-0002">
<persName>
<surname>Wool</surname>
<forename type="first">Assaf</forename>
</persName>
<affiliation>1Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and 2The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
</author>
<author xml:id="author-0003">
<persName>
<surname>Toporik</surname>
<forename type="first">Amir</forename>
</persName>
<affiliation>1Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and 2The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
</author>
<author xml:id="author-0004">
<persName>
<surname>Apatoff</surname>
<forename type="first">Avihay</forename>
</persName>
<affiliation>1Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and 2The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
</author>
<author xml:id="author-0005">
<persName>
<surname>Olshansky</surname>
<forename type="first">Moshe</forename>
</persName>
<affiliation>1Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and 2The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
</author>
<idno type="istex">B0C4B4C1EC355D0EB86F236BEA4E417647565B77</idno>
<idno type="ark">ark:/67375/HXZ-RQ5WHZPQ-4</idno>
<idno type="DOI">10.1093/bioinformatics/btn084</idno>
<idno type="publisher-id">btn084</idno>
</analytic>
<monogr>
<title level="j" type="main">Bioinformatics</title>
<idno type="hwp">bioinfo</idno>
<idno type="publisher-id">bioinformatics</idno>
<idno type="pISSN">1367-4803</idno>
<idno type="eISSN">1460-2059</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published">2008</date>
<date type="e-published">2008</date>
<biblScope unit="vol">24</biblScope>
<biblScope unit="issue">8</biblScope>
<biblScope unit="page" from="1049">1049</biblScope>
<biblScope unit="page" to="1055">1055</biblScope>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
</fileDesc>
<encodingDesc>
<schemaRef type="ODD" url="https://xml-schema.delivery.istex.fr/tei-istex.odd"></schemaRef>
<appInfo>
<application ident="pub2tei" version="1.0.41" when="2020-04-06">
<label>pub2TEI-ISTEX</label>
<desc>A set of style sheets for converting XML documents encoded in various scientific publisher formats into a common TEI format.
<ref target="http://www.tei-c.org/">We use TEI</ref>
</desc>
</application>
</appInfo>
</encodingDesc>
<profileDesc>
<abstract>
<p>
<hi rend="bold">Motivation:</hi>
Many secretory proteins are synthesized as inactive precursors that must undergo post-translational proteolysis in order to mature and become active. In the current study, we address the challenge of sequence-based discovery of proteolytic sites in secreted proteins using machine learning.</p>
<p>
<hi rend="bold">Results:</hi>
The results revealed that only half of the extracellular proteolytic sites are currently annotated, leaving over 3600 unannotated ones. Furthermore, we have found that only 6% of the unannotated sites are similar to known proteolytic sites, whereas the remaining 94% do not share significant similarity with any annotated proteolytic site. The computational challenges in these two cases are very different. While the precision in detecting the former group is close to perfect, only a mere 22% of the latter group were detected with a precision of 80%. The applicability of the classifier is demonstrated through members of the FGF family, in which we verified the conservation of physiologically-relevant proteolytic sites in homologous proteins.</p>
<p>
<hi rend="bold">Contact:</hi>
; </p>
<p>
<hi rend="bold">Supplementary information:</hi>
Supplementary data are available at
<hi rend="italic">Bioinformatics</hi>
online.</p>
</abstract>
<textClass ana="subject">
<keywords scheme="subject">
<term>ORIGINAL PAPERS</term>
</keywords>
</textClass>
<langUsage>
<language ident="EN"></language>
</langUsage>
</profileDesc>
<revisionDesc>
<change when="2020-04-06" who="#istex" xml:id="pub2tei">formatting</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="corpus oup, element #text not found" wicri:toSee="no header">
<istex:xmlDeclaration>version="1.0" encoding="utf-8"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" URI="journalpublishing.dtd" name="istex:docType"></istex:docType>
<istex:document>
<article article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="hwp">bioinfo</journal-id>
<journal-id journal-id-type="publisher-id">bioinformatics</journal-id>
<journal-title>Bioinformatics</journal-title>
<issn pub-type="ppub">1367-4803</issn>
<issn pub-type="epub">1460-2059</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.1093/bioinformatics/btn084</article-id>
<article-id pub-id-type="publisher-id">btn084</article-id>
<article-categories>
<subj-group>
<subject>ORIGINAL PAPERS</subject>
<subj-group>
<subject>SEQUENCE ANALYSIS</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Predicting proteolytic sites in extracellular proteins: only halfway there</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Kliger</surname>
<given-names>Yossef</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="COR1">*</xref>
<xref ref-type="fn" rid="FN1">
<sup></sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Gofer</surname>
<given-names>Eyal</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
<xref ref-type="fn" rid="FN1">
<sup></sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wool</surname>
<given-names>Assaf</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Toporik</surname>
<given-names>Amir</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Apatoff</surname>
<given-names>Avihay</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="AFF1">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Olshansky</surname>
<given-names>Moshe</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group>
<aff id="AFF1">
<sup>1</sup>
Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and
<sup>2</sup>
The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</aff>
<author-notes>
<corresp id="COR1">*To whom correspondence should be addressed.</corresp>
<fn>
<p>Associate Editor: John Quackenbush</p>
</fn>
<fn id="FN1">
<p>
<sup></sup>
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<day>15</day>
<month>4</month>
<year>2008</year>
</pub-date>
<pub-date pub-type="epub">
<day>4</day>
<month>3</month>
<year>2008</year>
</pub-date>
<volume>24</volume>
<issue>8</issue>
<fpage>1049</fpage>
<lpage>1055</lpage>
<history>
<date date-type="received">
<day>13</day>
<month>12</month>
<year>2007</year>
</date>
<date date-type="rev-recd">
<day>10</day>
<month>2</month>
<year>2008</year>
</date>
<date date-type="accepted">
<day>1</day>
<month>3</month>
<year>2008</year>
</date>
</history>
<permissions>
<copyright-statement>© 2008 The Author(s)</copyright-statement>
<copyright-year>2008</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by-nc/2.0/uk/">
<p>This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
</license>
</permissions>
<abstract>
<p>
<bold>Motivation:</bold>
Many secretory proteins are synthesized as inactive precursors that must undergo post-translational proteolysis in order to mature and become active. In the current study, we address the challenge of sequence-based discovery of proteolytic sites in secreted proteins using machine learning.</p>
<p>
<bold>Results:</bold>
The results revealed that only half of the extracellular proteolytic sites are currently annotated, leaving over 3600 unannotated ones. Furthermore, we have found that only 6% of the unannotated sites are similar to known proteolytic sites, whereas the remaining 94% do not share significant similarity with any annotated proteolytic site. The computational challenges in these two cases are very different. While the precision in detecting the former group is close to perfect, only a mere 22% of the latter group were detected with a precision of 80%. The applicability of the classifier is demonstrated through members of the FGF family, in which we verified the conservation of physiologically-relevant proteolytic sites in homologous proteins.</p>
<p>
<bold>Contact:</bold>
<email>kliger@compugen.co.il</email>
;
<email>yossef.kliger@gmail.com</email>
</p>
<p>
<bold>Supplementary information:</bold>
Supplementary data are available at
<italic>Bioinformatics</italic>
online.</p>
</abstract>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="SEC1">
<title>1 INTRODUCTION</title>
<p>Many secretory proteins and peptides are initially synthesized as larger precursors, usually in the form of pre-pro-proteins. Such precursor proteins undergo post-translational proteolysis: the N-terminal pre-region, known as signal peptide, is cleaved by a well-characterized signal peptidase [reviewed in (Paetzel
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B32">2002</xref>
)], while various proteases liberate the active proteins from the pro-proteins. The following examples demonstrate the importance of the latter process and its regulation: (i) The envelope (Env) glycoprotein of HIV-1 is synthesized as a precursor polypeptide. In the trans-Golgi network, Env is cleaved by the cellular protease furin into two functional subunits. Cleavage of Env occurs at a conserved sequence. Mutagenesis of this sequence produces non-infectious HIV-1 particles containing unprocessed Env (Earl
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B18">1991</xref>
; Kowalski
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B28">1987</xref>
; McCune
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B29">1988</xref>
). This finding establishes the importance of furin-mediated processing for virus-infectivity. Accordingly, inhibitors of the host protease furin impede HIV-1 replication by interfering with the proteolytic processing of Env, suggesting they are useful for combating HIV-1 (Bahbouhi
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B3">2002</xref>
; Hallenberger
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B21">1992</xref>
; Kibler
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B25">2004</xref>
). Furthermore, inhibiting the production of peptides involved in various diseases by blocking the activity of the proteolytic enzymes is a promising approach (Basak,
<xref ref-type="bibr" rid="B4">2005</xref>
; Bergeron
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B6">2005</xref>
; de Haan
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B15">2004</xref>
). (ii) The release of peptide hormones is subject to a complex and finely tuned regulation system. Post-translational proteolysis plays a key role by specifically converting the pro-hormone precursor into biologically active products. Examples of peptide hormones, whose proteolytic processing regulates their activities, are: insulin, somatostatin, parathyroid hormone, glucagon and GLP-1. Many of these are used as therapeutic peptides for treating various disorders.</p>
<p>The importance of identifying mature proteins fuels both experimental and computational approaches aimed at discovering and predicting proteolytic sites. Experimental attempts to unveil the human plasma proteome using proteomics methods fail to detect most cytokines and protein hormones, presumably due to their low abundance [summarized in (Anderson
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B1">2004</xref>
)]. Currently, most computational approaches are protease-oriented and rely on proteolytic site data of specific enzymes (Blom
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B7">1996</xref>
; Cai
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B12">1998</xref>
; Kiemer
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B27">2004</xref>
; Yang and Berry,
<xref ref-type="bibr" rid="B39">2004</xref>
). However, while proteolytic sites in a protein can be experimentally identified, for example, by N-terminal sequencing of the processed protein fragments, it is much harder to find out the catalyzing protease involved. Hence, only a limited number of experimentally verified proteolytic sites can be associated with a specific proteolytic enzyme, and therefore the data available as training sets for these methods is relatively limited.</p>
<p>Many of the proteolytic sites whose catalyzing enzymes are known are processed by members of one family of serine proteases, called pro-hormone convertases (PCs) (Seidah
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B34">1998</xref>
). All known proteolytic sites of mammalian PCs have an arginine or a lysine at the first position N-terminal to the proteolytic sites. Furthermore, no other enzyme that catalyzes the processing of proteins in the secretory pathway is known to cleave immediately after these basic amino acid residues. It is therefore reasonable to assume that proteolysis after a basic residue is catalyzed by a member of the PC family. This allows data extraction of sequences of proteins, which are processed by a PC member, from databases of precursor proteins and proteolytic sites. Such extracted data, together with the evolutionary relatedness between the members of the PC family, suggests that it might be possible to construct a classifier that will discriminate between PC proteolytic sites, regardless of the specific PC member, and other sites. Such an approach was taken by Blom and colleagues (Duckert
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B17">2004</xref>
), who extracted PC proteolytic sites based on Swiss-Prot (version 40) annotation. Herein, we describe an improved data extraction process, which considered more proteolytic sites. The extracted data was used for training classifiers, which are based on two different classification algorithms—Random Forest and Support Vector Machines. The best classifier was used to provide a comprehensive list of predicted proteolytic sites in the mammalian secretome. Several interesting predictions of proteolytic sites are discussed.</p>
</sec>
<sec sec-type="methods" id="SEC2">
<title>2 METHODS</title>
<sec id="SEC2.1">
<title>2.1 Data preparation</title>
<p>All eukaryotic proteins were downloaded from the Swiss-Prot knowledgebase version 47.4 (Boeckmann
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B8">2003</xref>
). Proteins whose first residue is not methionine were discarded, as they might not contain the full-length sequence of the precursor protein. The same holds for Swiss-Prot entries that include the phrase ‘PROTEIN SEQUENCE’, but do not include ‘NUCLEOTIDE SEQUENCE’ in their RP annotation lines, as these entries might contain sequences of processed proteins, rather than the full-length precursor proteins. Data of proteolytic sites were extracted from the post-translational modifications annotation lines (FT) of the Swiss-Prot knowledgebase (Farriol-Mathis
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B19">2004</xref>
).</p>
</sec>
<sec id="SEC2.2">
<title>2.2 Classifiers</title>
<p>Two types of classifiers were tested: Random Forest (RF) (Breiman,
<xref ref-type="bibr" rid="B11">2001</xref>
) and Support Vector Machines (SVM) (Vapnik and Cortes,
<xref ref-type="bibr" rid="B36">1995</xref>
). For the SVM classifier, we used Joachims’ SVMlight package (Joachims,
<xref ref-type="bibr" rid="B23">1999</xref>
).</p>
</sec>
<sec id="SEC2.3">
<title>2.3 Signal peptide prediction</title>
<p>Predicting whether a protein has an N-terminal signal sequence, was performed using the SignalP 3.0 prediction tool (Bendtsen
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B5">2004</xref>
).</p>
</sec>
<sec id="SEC2.4">
<title>2.4 Multiple sequence alignment</title>
<p>Multiple sequence alignments were computed with ProbCons (Do
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B16">2005</xref>
) and were edited using Jalview (Clamp
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B13">2004</xref>
).</p>
</sec>
</sec>
<sec sec-type="results" id="SEC3">
<title>3 RESULTS</title>
<sec id="SEC3.1">
<title>3.1 Proteolytic site data extraction</title>
<p>Since the aim of the classifier was to model proteolytic processes taking place in the secretory pathway, only secreted proteins and extracellular parts of membranal proteins (secretome) were considered. Thus, only proteins annotated as containing a signal peptide or a transmembrane domain in the feature table (FT) lines of the Swiss-Prot annotation record, or annotated as being secreted or extracellular in the comment (CC) lines of the Swiss-Prot annotation record were selected.</p>
<p>In the case of integral membrane proteins, cytoplasmic domains were not considered. The membrane topology information, i.e. the location of the membrane-spanning regions and their orientation, was extracted from the topology annotation lines of the Swiss-Prot entry (FT TOPO_DOM and FT TRANSMEM). When these lines do not span the full length of the protein, we completed the full topology of the protein according to the annotated signal peptide, transmembrane domains, extracellular domains and cytoplasmic domains. This process was performed twice: once by starting from the most N-terminal topology annotation, and once by starting from the most C-terminal topology annotation. Whenever discrepancies between the two completion processes were found, the Swiss-Prot entry was discarded. Such discrepancies point to mistakes in the topology annotation of multi-span proteins. Ideally, the extracted proteolytic sites should be divided into sites that are catalyzed by enzymes working in the secretory pathway, the extracellular matrix, the cytoplasm, the digestive system or in extracellular fluids. When available, annotation of the identity of the proteolytic enzyme was extracted from the FT annotation lines (following the phrase ‘Removed by’ in the description of PROPEPs lines, or following ‘by’ in the description of ‘SITE…CLEAVAGE’ lines). As the aim of this study is to model the processes that take place in the secretory pathways, proteolysis processed by enzymes that are known to act outside the secretory pathway were discarded. The list of enzymes known to act outside the secretory pathway that appear in the annotation of Swiss-Prot entries of the proteins they cleave includes: adam17, aggrecanase, alpha-secretase, beta-secretase, caspase-6, cathepsin G, arginine-specific endoprotease, C3 convertase, chymosin, collagenase, dipeptidase, dipeptidylpeptidase, DPP4, easter, elastase, kallikrein and kallikrein-like serine protease, MMPs (2, 3, and 9), coagulation factors (I, VIIa, IXa, Xa and XIa), plasmin, procollagen C-endopeptidase, procollagen N-endopeptidase, rennin, thrombin, trypsin and u-PA.</p>
<p>Blom and colleagues (Duckert
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B17">2004</xref>
) extracted PC proteolytic sites based on Swiss-Prot annotation. They screened for precursor proteins that are annotated to have a signal peptide, followed by a PROPEP that ends with an arginine or a lysine, and then followed by a PEPTIDE or a CHAIN. They were then able to construct an artificial neural network-based classifier for predicting proteolytic sites catalyzed by members of the pro-hormone convertase family of proteases (Duckert
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B17">2004</xref>
). However, this procedure is too strict for part of the proteolytic sites. For example, human insulin (Swiss-Prot ID: INS_HUMAN) is composed of a signal peptide, followed by a PEPTIDE, a PROPEP and then another PEPTIDE. These two well-characterized proteolytic sites were ignored by the conservative extraction, because insulin has no PROPEP immediately after the signal peptide. Therefore, due to the scarcity of data, we used a less strict data extraction procedure as described below.</p>
<p>This study focuses on proteolytic sites of enzymes that cut immediately after lysines or arginines. Such enzymes are often classified as members of the pro-hormone convertase family. Therefore, only sites with a lysine or arginine at the first position N-terminal to the proteolytic site were considered. We extracted all 30-mers of the secretome, arranged symmetrically around a potential proteolytic site after a basic residue, and designated them as follows: (i) Experimentally-validated proteolytic sites, which are annotated by a Swiss-Prot FT annotation line according to the word template ‘SITE…CLEAVAGE’, were marked VALIDATED. (ii) Experimentally-validated proteolytic sites, whose existence is indicated by the annotation of the two protein segments right before and immediately after the proteolytic site, were also marked VALIDATED. The annotation for protein segments is in the form of Swiss-Prot FT annotation lines having the word template ‘PEPTIDE (or PROPEPTIDE or CHAIN) [first residue] [last residue]’, and the two segments of the protein should be consecutive, i.e. the first residue of the second segment immediately follows the last residue of the first segment. We do allow for a short linker section in between the two segments, provided that it is likely to be removed by exopeptidase E after the processing of the protein precursor by a pro-hormone convertase (Day
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B14">1998</xref>
; Friis-Hansen
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B20">2001</xref>
). We consider linker sections consisting of K, R, KK, KR, RK, RR, or successive Ks and/or Rs followed by a classical furin proteolytic site (RXKR or RXRR, where X is any natural amino acid) as likely to be cut by exopeptidase E. We also allow for a glycine to immediately upstream of the basic residue/s at the C-terminus of the first PEPTIDE, PROPEPTIDE or CHAIN, as it is likely that these peptides are substrates for C-terminal alpha-amidating enzymes that convert the peptides to the corresponding desglycine peptide amide, where glycine is the amide donor (Bradbury
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B10">1982</xref>
). The ambiguous sites (after each of the residues located in-between the two annotation lines) are marked AMBG. (iii) When only one PEPTIDE, PROPEPTIDE or CHAIN annotation line suggests the existence of a proteolytic site, our confidence in the proteolysis site is reduced and the site is marked POTENTIAL. (iv) When comments like ‘PROBABLE’, ‘BY SIMILARITY’ or ‘POTENTIAL’ (Farriol-Mathis
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B19">2004</xref>
; Junker
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B24">1999</xref>
) appear in the description of the FT lines in the cases described in (i) and (ii), the proteolytic site is designated as POTENTIAL. (v) When the distance between two proteolytic sites does not exceed four residues, the reliability of both sites is reduced. Such proteolytic sites are marked POTENTIAL unless there is strong support for their reliability. Strong support for one or both of the two proteolytic sites is considered if a proteolytic site is marked VALIDATED according to the criterion in (i). Strong support for one or both of the two proteolytic sites is also considered if a proteolytic site is marked POTENTIAL according to the SITE…CLEAVAGE annotation line, and also marked VALIDATED according to the criterion in (ii). (vi) All other positions were marked NON (Table SI).</p>
</sec>
<sec id="SEC3.2">
<title>3.2 Training, validation, and test sets</title>
<p>Ideally, data would be separated into distinct training, test and validation sets. However, the relative scarcity of cleavage sites, and their different levels of reliability, present a challenge when preparing datasets for classification, and necessitate a different approach. A validation set consisting of a random quarter of the data was held out and used for parameter optimization. The rest of the data were used, once optimal parameters were chosen, in cross-validation to evaluate performance. When training, only the most reliable proteolytic sites, namely, sites that were marked VALIDATED, were used as positive examples, while a subset of the sites marked NON was used as negative examples. For the purpose of performance evaluation, on the other hand, it is important to use a set representative of all data. Thus, in the parts of the data used for testing, proteolytic sites that were marked VALIDATED or POTENTIAL were labeled positive, while those marked NON or AMBG were labeled negative.</p>
</sec>
<sec id="SEC3.3">
<title>3.3 Classifier construction</title>
<p>Homologous sequences raise special difficulties due to the relationship between redundancy and information. It is therefore essential to handle them with care. One approach is to discard some of the protein sequences, in a way that maximizes coverage and minimizes redundancy (Hobohm
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B22">1992</xref>
). The weakness of this approach is that it prevents learning from the subtle changes that exist between very similar sequences. For this reason, and due to the scarcity of annotated data, others and we decided to use all available data. This approach requires special precautions in order to minimize the risk of overestimating the predictive performance owing to training set and test set similarities. One way to avoid training and testing on homologous data is to divide the data into several partitions based on a phylogenetic tree, and then calculate the performance by cross-validation (Duckert
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B17">2004</xref>
). We used a different approach, which is described in what follows.</p>
<p>We argue that the task of classifying a site is naturally divided into two cases, depending on whether or not this site is similar (to a degree, homologous) to a known proteolytic site, i.e. a proteolytic site present in the training set. Classifying ‘seen before’ sites and ‘new’ sites are tasks that are different in nature, and have a different level of difficulty. This implies the need for two methods of classification, and, more important, for separate performance evaluation for the two tasks. In order to discriminate between the classification tasks, we analyzed 18-mers, arranged symmetrically around a potential proteolytic site, which were marked as VALIDATED or POTENTIAL. Each 18-mer was compared to its most similar known proteolytic site, and the number of identical residues was counted. Our analysis confirmed that 18-mer sites that share more than nine residues with a known proteolytic site are most likely to be proteolytic sites themselves (Figure S1).</p>
<p>We chose this threshold for dividing the data into ‘new’ and ‘seen before’ sites. The number of identical residues to the closest known proteolytic site was also used as an additional input feature for the classifier. This feature improves the classification results (Figure S2).</p>
<p>
<xref ref-type="fig" rid="F1">Figure 1</xref>
reveals that, as expected, the tasks of classifying ‘seen before’ sites and classifying ‘new’ sites, are different in nature, and confirms the need for two separate performance evaluations. In addition, a classifier trained to identify ‘new’ sites was more successful at identifying ‘new’ sites than a classifier trained to identify ‘seen before’ sites (
<xref ref-type="fig" rid="F1">Figure 1</xref>
B).
<fig id="F1">
<label>
<bold>Fig. 1.</bold>
</label>
<caption>
<p>The effect of creating two specialized classifiers. It is clear that the performance of classifiers for ‘seen before’ and ‘new’ sites should be evaluated separately. Furthermore, the figure shows that it is worth training specialized classifiers: (
<bold>A</bold>
) Identification of ‘seen before’ sites. The classifier trained to identify ‘seen before’ sites is somewhat better at identifying such sites than the classifier trained to identify ‘new’ sites. (
<bold>B</bold>
) Identification of ‘new’ sites. The classifier trained to identify ‘new’ sites performs better than the classifier trained to identify ‘seen before’ sites at identifying ‘new’ sites.</p>
</caption>
<graphic xlink:href="btn084f1"></graphic>
</fig>
</p>
<sec id="SEC3.3.1">
<title>3.3.1 Parameter tuning</title>
<p>A quarter of the data was picked out at random to serve only for tuning parameters, while the rest was used at the tuning stage for training. The held out set was divided into ‘seen before’ and ‘new’ sites, based on the maximal similarity to known sites in the training set. The two classifiers, for ‘seen before’ sites and for ‘new’ sites, were then, separately, optimized by evaluating precision vs. recall graphs based on the raw score output of Random Forest (RF). The inputs to the classifier were (i) a symmetrical window around the site, and (ii) the maximal identity to a known cleavage site, divided by the window size. For the classifier specialized in ‘seen before’ sites, we used a symmetrical window of 20 residues surrounding each site, a negative set 50 times larger than the positive set, and the internal weighting mechanism of RF was set to give a weight of 50 to the positive set, and 1 to the negative set. Mtry was set to 5, and 200 trees were found to be sufficient. For the classifier aimed at identifying ‘new’ proteolytic sites, we used a symmetrical window of 12 residues around each site, a negative set 50 times larger than the positive set, and the internal weighting was set to 2 for the positive set and 1 for the negative set. Mtry was set to 2 and 200 trees were again found to be sufficient. For the SVM classifier, we tried different polynomial kernels. The best degrees were found to be 4 and 6 for the ‘seen before’ and ‘new’ classifiers, respectively. The vectors fed to the SVMs were in sparse representation (Qian and Sejnowski,
<xref ref-type="bibr" rid="B33">1988</xref>
). The maximal identity value was used with the SVM the same way as with the RF classifier.</p>
</sec>
<sec id="SEC3.3.2">
<title>3.3.2 Classifier construction and performance evaluation</title>
<p>The data that was not used as testing data in the parameter optimization step (three quarters of the data) was used for 10-fold stratified cross-validation. Specifically, at each step of the cross-validation, nine-tenths of the data were used for training. The remaining tenth was used for testing after being divided into ‘seen before’ and ‘new’ sets with respect to the current training set. By ‘stratified’ we mean that each tenth part of the data contained the same proportion of VALIDATED, POTENTIAL, etc. sites. The parameters used were those found to be optimal in the parameter tuning step.</p>
</sec>
<sec id="SEC3.3.3">
<title>3.3.3 Performance evaluation correction</title>
<p>As explained above, all the data that was not used for parameter tuning was used for testing, in order to reflect the heterogeneity of the data as much as possible. However, there is uncertainty as to the label of any data that is not VALIDATED. To a large degree, we trust sites designated POTENTIAL to be real proteolytic sites. Manual reviewing of many of the POTENTIAL sites suggests that this assumption is reasonable. We assume that most AMBG and NON sites are not proteolytic sites. Still, it is expected that yet undiscovered proteolytic sites are hidden among the sites marked NON or AMBG. The sheer volume of NON sites raises the suspicion that there are even more unknown proteolytic sites labeled NON than known proteolytic sites. This may distort performance evaluation statistics. We present below a calculation that attempts to tackle this problem.
<disp-formula id="M1">
<label>(1)</label>
<graphic xlink:href="btn084m1"></graphic>
</disp-formula>
<disp-formula id="M2">
<label>(2)</label>
<graphic xlink:href="btn084m2"></graphic>
</disp-formula>
<disp-formula id="M3">
<label>(3)</label>
<graphic xlink:href="btn084m3"></graphic>
</disp-formula>
<disp-formula id="M4">
<label>(4)</label>
<graphic xlink:href="btn084m4"></graphic>
</disp-formula>
Where
<italic>TP
<sub>i</sub>
</italic>
denotes instances in the positive set, correctly classified as positive,
<italic>TP
<sub>o</sub>
</italic>
represents mislabeled instances in the negative set, correctly classified as positive,
<italic>T
<sub>i</sub>
</italic>
denotes instances in the positive set,
<italic>T
<sub>o</sub>
</italic>
represents mislabeled instances in the negative set, and
<italic>P
<sub>o</sub>
</italic>
denotes instances in the negative set, classified as positive. It is now easy to note that calculated precision evaluations are always underestimated. The reason is that while the denominator in Equation (2) is the same as in Equation (
<xref ref-type="disp-formula" rid="M1">4</xref>
), the numerator does not include
<italic>TP
<sub>o</sub>
</italic>
, which may be even larger than
<italic>TP
<sub>i</sub>
</italic>
.</p>
<p>We now proceed under the assumption that negative data is a mixture of two statistical types of data—mislabeled positives (a fraction α of the negative data) and real negatives. Mislabeled positives are assumed to have the same statistical nature as positive data. Let
<italic>F
<sub>i</sub>
</italic>
(
<italic>F
<sub>o</sub>
</italic>
) be the cumulative distribution function of the score for positive (negative) data. Let
<italic>N
<sub>i</sub>
</italic>
(
<italic>N
<sub>o</sub>
</italic>
) be the number of positive (negative) instances. Let
<italic>t</italic>
be a threshold for the score.
<disp-formula id="M5">
<label>(5)</label>
<graphic xlink:href="btn084m5"></graphic>
</disp-formula>
Note that the real recall is independent of α, and is therefore equal to the ordinary recall calculated without assuming any mislabeling.
<disp-formula id="M6">
<label>(6)</label>
<graphic xlink:href="btn084m6"></graphic>
</disp-formula>
The real precision is the ordinary precision multiplied by a correction factor: (1 + α
<italic>N
<sub>o</sub>
</italic>
/
<italic>N
<sub>i</sub>
</italic>
). Therefore, for α = 0 we recover the ordinary precision.</p>
<p>To summarize, mislabeling leaves the recall unchanged, while the precision is enhanced by a factor (1 + α
<italic>N
<sub>o</sub>
</italic>
/
<italic>N
<sub>i</sub>
</italic>
) = 1 +
<italic>T
<sub>o</sub>
</italic>
/
<italic>T
<sub>i</sub>
</italic>
.</p>
<p>For furin proteolysis, we can obtain a reasonable estimate of this factor, because furin sites have an easily detectable consensus (Nakayama,
<xref ref-type="bibr" rid="B30">1997</xref>
). We extrapolate from furin to proteolytic sites of other members of the pro-hormone convertase family, in an attempt to reflect the curation level of proteolysis annotation in the Swiss-Prot knowledgebase. We look for the furin proteolysis consensus site, after RXKR or after RXRR, in the positive and negative sets. The instances in the positive set are real positives, whereas the ones in the negative set are a mixture of proteolytic and non-proteolytic sites. There is evidence that a lysine located two positions after the putative proteolytic site prevents cleavage, so such instances were excluded.</p>
<p>In addition, we observed which residues are most frequent immediately after the proteolytic site in the positive set. Our method for finding the ratio
<italic>T
<sub>o</sub>
</italic>
/
<italic>T
<sub>i</sub>
</italic>
was to look for the same subfamily of sites in both positive and negative sets: instances of a furin consensus followed by one of the 3 most frequent residues (as found in the positive set), excluding lysine in the second post-cleavage position. The calculated furin correction factor was found to be 1.11 for the ‘seen before’ classifier, and 3.04 for the ‘new’ classifier. Note that because of the inaccuracy of this correction procedure, corrected precision values may exceed 1. It must be emphasized that the furin correction factor is based on the assumptions that the ratio of annotated proteolytic sites to unannotated sites is equal for furin and other PC sites, and that classifier score distributions are mixtures as described above. Both these assumptions are very rough approximations. Still, we believe this correction gives a better evaluation of classifier performance. A comparison between the performance of RF and SVM classifiers specialized in ‘new’ sites is shown in
<xref ref-type="fig" rid="F2">Figure 2</xref>
. The RF classifier performs better in the high precision/low recall area, while SVM performs better in the high recall/low precision area.
<xref ref-type="fig" rid="F2">Figure 2</xref>
also shows the effect of the furin correction factor on the raw score output of the RF and SVM classifiers. The performance of both the RF and SVM ‘seen before’ classifiers is almost perfect (Figure S3), as expected, and becomes perfect when applying correction (data not shown).
<fig id="F2">
<label>
<bold>Fig. 2.</bold>
</label>
<caption>
<p>Comparison between RF and SVM classifiers specialized in ‘new’ sites, and the effect of the furin correction factor. VALIDATED and POTENTIAL data are treated as positive for testing, the rest as negative. The furin correction is a way to compensate for the fact that some of the data we treated as negative for cleavage is actually mislabeled (unknown proteolytic sites). (
<bold>A</bold>
) Raw score output of the RF and SVM classifiers; (
<bold>B</bold>
) Precision is multiplied by 3.04, which is the calculated furin correction factor. It should be remarked that because of the imperfection of the correction procedure, corrected precision values may exceed 1. Precision values that exceed 1 are set to 1.</p>
</caption>
<graphic xlink:href="btn084f2"></graphic>
</fig>
</p>
</sec>
</sec>
<sec id="SEC3.4">
<title>3.4 Proteolytic site prediction</title>
<p>The classification procedure described above was repeated, but this time, no holdout set was removed, and 10-fold stratified cross-validation was applied to the whole eukaryotic secretome. For each classifier, scores were replaced by their corresponding precision values. Each site was given a single score: a ‘seen before’ site was given its score according to the ‘seen before’ classifier, and a ‘new’ site was given its score according to the ‘new’ classifier.</p>
<p>For ‘new’ sites, there are 1663 VALIDATED and POTENTIAL sites, and 569 820 NON and AMBG sites, and the furin correction factor is 3.04. For ‘seen before’ sites, there are 2099 VALIDATED and POTENTIAL sites, and 1035 NON and AMBG sites, and the furin correction factor is 1.11. Based on our data extraction, performance evaluation, and the furin correction factor, we estimate that the eukaryotic secretome is comprised of about 7385 proteolytic sites, of which 2330 (2099 * 1.11) are ‘seen before’, i.e. quite similar to known proteolytic sites, and 5055 (1663 * 3.04) are ‘new’, i.e. do not share significant sequence similarity to any annotated proteolytic site.</p>
<p>The furin correction factor also allows us to estimate the fraction of unannotated proteolysis for ‘seen before’ and ‘new’ sites. Our results reveal that only 9.9% (0.11/1.11) of ‘seen before’ sites are still unannotated, while 67% (2.04/3.04) of ‘new’ sites are yet to be discovered. Furthermore, the RF classifier specialized in ‘seen before’ sites predicts apparently all 231 ‘seen before’ sites with a precision greater than 90%, while the RF classifier specialized in ‘new’ sites predicts about 33% of the 3393 unknown ‘new’ sites with a precision of 50%, and 22% with a precision of 80% (
<xref ref-type="fig" rid="F2">Fig. 2</xref>
).</p>
</sec>
<sec id="SEC3.5">
<title>3.5 Predicted proteolytic sites in members of the fibroblast growth factor family</title>
<p>Swiss-Prot 47.4 does not include annotation for proteolytic sites in any of the members of the Fibroblast Growth Factor (FGF) family. Yet, our prediction method suggests several proteolytic sites in some of the proteins in this family, resulting in a classification of the FGF proteins into three groups of orthologs: FGFs that have conserved N-terminal proteolytic sites, FGFs that have conserved C-terminal proteolytic sites and all others (Table SII). A literature search confirmed some of our predictions.</p>
<p>Functional proteolytic sites are expected to be conserved among close species. Our classifier revealed that the proteolytic site in FGF23 is indeed conserved in all available FGF23 orthologs (
<xref ref-type="fig" rid="F3">Fig. 3</xref>
). The C-terminal proteolytic site of FGF23 is important for normal activity of the protein. Several groups reported proteolysis in FGF23 between Arg179 and Ser180, and mutations in proximity to this site (R179W, R179Q and R176Q) were identified in patients with autosomal-dominant hypophosphatemic rickets (ADHR) (Bowe
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B9">2001</xref>
; Shimada
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B35">2002</xref>
; White
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B38">2000</xref>
,
<xref ref-type="bibr" rid="B37">2001</xref>
). The authors suggested that the proteolysis causes protein inactivation, and that these mutations created a polypeptide less sensitive to proteolysis, thus leading to elevated concentrations of FGF23, and to phosphate wasting in ADHR patients. Our prediction method revealed that these mutated forms of FGF23 do not undergo C-terminal proteolysis (
<xref ref-type="fig" rid="F3">Fig. 3</xref>
). Furthermore, our predictions of proteolytic sites in the C-terminus of the other FGF family members might also imply their deactivation by proteolysis processing.
<fig id="F3">
<label>
<bold>Fig. 3.</bold>
</label>
<caption>
<p>Proteolytic site predictions for FGF23 of human, three mutant forms from ADHR patients, and three vertebrate orthologs. Sequences of FGF23 of human, mouse, rat and pufferfish were aligned together with R179W, R179Q and R176Q human FGF23 mutants (mutations are highlighted in dark grey). High score cleavage predictions were assigned to the true cleavage sites (highlighted in light grey). In normal FGF23, cleavage is known to take place between the two amino acids in light grey.</p>
</caption>
<graphic xlink:href="btn084f3"></graphic>
</fig>
</p>
<p>Another known case is the N-terminal proteolytic of FGF3. The amino-terminal region downstream of the signal peptide of the protein is involved in its retention in the Golgi apparatus and the regulation of its secretion (Kiefer
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B26">1993</xref>
). We predicted proteolytic sites in the N-terminus of human, mouse, zebrafish, chicken and xenopus FGF3. Indeed, in xenopus, proteolysis between Arg45 and Asp46 is essential for FGF3's biological activity (Antoine
<italic>et al.</italic>
,
<xref ref-type="bibr" rid="B2">2000</xref>
). We suggest that proteolysis of 10–27 N-terminal amino acids occurs during the maturation of other FGFs, and may be important for their biological activity. The multiple sequence alignment in
<xref ref-type="fig" rid="F4">Figure 4</xref>
confirms that the N-terminal proteolytic site is conserved between some FGF family members and in proximity to an upstream variable region. It is worth noting that the proteolytic site is conserved even among remote homologs. Some of these homologs possess an N-terminal signal peptide and are secreted via the classical secretory pathway, while others do not possess a signal peptide and are secreted via an alternative pathway (Nickel,
<xref ref-type="bibr" rid="B31">2003</xref>
).
<fig id="F4">
<label>
<bold>Fig. 4.</bold>
</label>
<caption>
<p>FGF3 and other FGF family members that undergo proteolysis in their N-terminal region. Proteolysis of the N-terminal region of FGF3 is important for regulating its activity. FGF11 to 14 were also assigned high score N-terminal cleavage site predictions, although they do not have a leading signal peptide. Removing the signal peptides of FGF3 members allows alignment of the N-terminal proteolytic sites. The high conservation of the proteolytic site signatures in contrast to the variability of the flanking sequences, confirms the importance of the proteolytic processing that as in FGF3 may be involved in the regulation of protein activity.</p>
</caption>
<graphic xlink:href="btn084f4"></graphic>
</fig>
</p>
</sec>
</sec>
<sec sec-type="discussion" id="SEC4">
<title>4 DISCUSSION</title>
<p>This study revealed a big potential for proteolytic site predictors, because most proteolytic sites are currently still unannotated. Furthermore, the furin correction factor gives an estimate of the total number of proteolytic sites. We estimate the eukaryotic secretome to comprise about 7385 (1663 × 3.04 + 2099 × 1.11) proteolytic sites, which means that about 1.3% of R/K in the secretome are proteolytic sites (7385/(1663 + 569820 + 2099 + 1035) = 0.0129). An important conclusion is that currently only about half of the proteolytic sites are annotated [(1663 + 2099)/7385 = 0.509], meaning there is a great value for predictors of proteolytic sites.</p>
<p>Another important issue raised in this article is performance evaluation when some of the data is mislabeled. This mislabeling is a result of missing annotation in our case, and these sites are often unknown proteolytic sites. We showed that such mislabeling leaves the recall unchanged, while the precision is reduced by a factor that can be estimated. Furthermore, by relying on a well-characterized subgroup, namely furin sites, we were able to estimate the degree of mislabeling. As mislabeling is very common in perhaps most current biological data, we believe that our calculation is relevant for performance evaluation in other biological classification problems.</p>
<p>Many sites are currently not annotated as proteolytic sites, but are predicted by our classifier with high precision. These include sites in currently developed therapeutic proteins, and in a few cases, the exact boundaries of peptides identified experimentally as minimal sequences required for functionality.</p>
<p>We demonstrate the prediction capability of the novel classifier in an analysis of members of the Fibroblast Growth Factor (FGF) family. We were able to discriminate real proteolysis sites from non-cleaving sites of mutant FGF23 proteins of ADHR patients. Additionally the predictor was able to identify cleavage sites in remote homologs, suggesting a regulatory role for the predicted cleavages by annotation transfer.</p>
<p>In summary, proteolysis has a great influence on the biological function of proteins, and therefore the accurate prediction of proteolytic sites is important for basic research and biotechnological applications. It allows identification of biologically active peptides from non-active precursors. In addition, it allows identification of mutations and polymorphisms that influence the generation of active peptides and proteins.</p>
</sec>
</body>
<back>
<ack>
<title>ACKNOWLEDGEMENTS</title>
<p>The authors are grateful to P. Duckert, W.L. McKeehan, M. Havilio, I. Borukhov, H. Ashkenazy, I. Myslyuk, E. Schreiber, and Y. Mansour for useful comments and helpful discussions.</p>
<p>
<italic>Conflict of Interest</italic>
: none declared.</p>
</ack>
<ref-list>
<title>REFERENCES</title>
<ref id="B1">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Anderson</surname>
<given-names>NL</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The human plasma proteome: a nonredundant list developed by combination of four separate sources</article-title>
<source>Mol. Cell Proteomics</source>
<year>2004</year>
<volume>3</volume>
<fpage>311</fpage>
<lpage>326</lpage>
</nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Antoine</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>NH2-terminal cleavage of xenopus fibroblast growth factor 3 is necessary for optimal biological activity and receptor binding</article-title>
<source>Cell Growth Differ.</source>
<year>2000</year>
<volume>11</volume>
<fpage>593</fpage>
<lpage>605</lpage>
</nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bahbouhi</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Effects of L- and D-REKR amino acid-containing peptides on HIV and SIV envelope glycoprotein precursor maturation and HIV and SIV replication</article-title>
<source>Biochem. J.</source>
<year>2002</year>
<volume>366</volume>
<fpage>863</fpage>
<lpage>872</lpage>
</nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Basak</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Inhibitors of proprotein convertases</article-title>
<source>J. Mol. Med.</source>
<year>2005</year>
<volume>83</volume>
<fpage>844</fpage>
<lpage>855</lpage>
</nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bendtsen</surname>
<given-names>JD</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Improved prediction of signal peptides: SignalP 3.0</article-title>
<source>J. Mol. Biol.</source>
<year>2004</year>
<volume>340</volume>
<fpage>783</fpage>
<lpage>795</lpage>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bergeron</surname>
<given-names>E</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Implication of proprotein convertases in the processing and spread of severe acute respiratory syndrome coronavirus</article-title>
<source>Biochem. Biophys. Res. Commun.</source>
<year>2005</year>
<volume>326</volume>
<fpage>554</fpage>
<lpage>563</lpage>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blom</surname>
<given-names>N</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Cleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networks</article-title>
<source>Protein Sci.</source>
<year>1996</year>
<volume>5</volume>
<fpage>2203</fpage>
<lpage>2216</lpage>
</nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boeckmann</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003</article-title>
<source>Nucleic Acids Res.</source>
<year>2003</year>
<volume>31</volume>
<fpage>365</fpage>
<lpage>370</lpage>
</nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bowe</surname>
<given-names>AE</given-names>
</name>
<etal></etal>
</person-group>
<article-title>FGF-23 Inhibits Renal Tubular Phosphate Transport and Is a PHEX Substrate</article-title>
<source>Biochem. Biophys. Res. Commun.</source>
<year>2001</year>
<volume>284</volume>
<fpage>977</fpage>
</nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bradbury</surname>
<given-names>AF</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Mechanism of C-terminal amide formation by pituitary enzymes</article-title>
<source>Nature</source>
<year>1982</year>
<volume>298</volume>
<fpage>686</fpage>
<lpage>688</lpage>
</nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Breiman</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Random forests</article-title>
<source>Machine Learning</source>
<year>2001</year>
<volume>45</volume>
<fpage>5</fpage>
<lpage>32</lpage>
</nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cai</surname>
<given-names>YD</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Artificial neural network method for predicting HIV protease cleavage sites in protein</article-title>
<source>J. Protein Chem.</source>
<year>1998</year>
<volume>17</volume>
<fpage>607</fpage>
<lpage>615</lpage>
</nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Clamp</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The Jalview Java alignment editor</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>20</volume>
<fpage>426</fpage>
<lpage>427</lpage>
</nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Day</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Prodynorphin processing by proprotein convertase 2. Cleavage at single basic residues and enhanced processing in the presence of carboxypeptidase activity</article-title>
<source>J. Biol. Chem.</source>
<year>1998</year>
<volume>273</volume>
<fpage>829</fpage>
<lpage>836</lpage>
</nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>de Haan</surname>
<given-names>CA</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Cleavage inhibition of the murine coronavirus spike protein by a furin-like enzyme affects cell–cell but not virus–cell fusion</article-title>
<source>J. Virol.</source>
<year>2004</year>
<volume>78</volume>
<fpage>6048</fpage>
<lpage>6054</lpage>
</nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Do</surname>
<given-names>CB</given-names>
</name>
<etal></etal>
</person-group>
<article-title>ProbCons: Probabilistic consistency-based multiple sequence alignment</article-title>
<source>Genome Res.</source>
<year>2005</year>
<volume>15</volume>
<fpage>330</fpage>
<lpage>340</lpage>
</nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Duckert</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Prediction of proprotein convertase cleavage sites</article-title>
<source>Protein Eng. Des. Sel.</source>
<year>2004</year>
<volume>17</volume>
<fpage>107</fpage>
<lpage>112</lpage>
</nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Earl</surname>
<given-names>PL</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Biological and immunological properties of human immunodeficiency virus type 1 envelope glycoprotein: analysis of proteins with truncations and deletions expressed by recombinant vaccinia viruses</article-title>
<source>J. Virol.</source>
<year>1991</year>
<volume>65</volume>
<fpage>31</fpage>
<lpage>41</lpage>
</nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Farriol-Mathis</surname>
<given-names>N</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Annotation of post-translational modifications in the Swiss-Prot knowledge base</article-title>
<source>Proteomics</source>
<year>2004</year>
<volume>4</volume>
<fpage>1537</fpage>
<lpage>1550</lpage>
</nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friis-Hansen</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Attenuated processing of proglucagon and glucagon-like peptide-1 in carboxypeptidase E-deficient mice</article-title>
<source>J. Endocrinol.</source>
<year>2001</year>
<volume>169</volume>
<fpage>595</fpage>
<lpage>602</lpage>
</nlm-citation>
</ref>
<ref id="B21">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hallenberger</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Inhibition of furin-mediated cleavage activation of HIV-1 glycoprotein gp160</article-title>
<source>Nature</source>
<year>1992</year>
<volume>360</volume>
<fpage>358</fpage>
<lpage>361</lpage>
</nlm-citation>
</ref>
<ref id="B22">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hobohm</surname>
<given-names>U</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Selection of representative protein data sets</article-title>
<source>Protein Sci.</source>
<year>1992</year>
<volume>1</volume>
<fpage>409</fpage>
<lpage>417</lpage>
</nlm-citation>
</ref>
<ref id="B23">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Joachims</surname>
<given-names>T</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Schölkopf</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Making large-scale support vector machine learning practical</article-title>
<source>Advances in Kernel Methods – Support Vector Learning
<italic> ch. 11,</italic>
</source>
<year>1999</year>
<publisher-loc>Cambridge, USA</publisher-loc>
<publisher-name>MIT Press</publisher-name>
<fpage>169</fpage>
<lpage>184</lpage>
</nlm-citation>
</ref>
<ref id="B24">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Junker</surname>
<given-names>VL</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Representation of functional information in the SWISS-PROT data bank</article-title>
<source>Bioinformatics</source>
<year>1999</year>
<volume>15</volume>
<fpage>1066</fpage>
<lpage>1067</lpage>
</nlm-citation>
</ref>
<ref id="B25">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kibler</surname>
<given-names>KV</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Polyarginine inhibits gp160 processing by furin and suppresses productive human immunodeficiency virus type 1 infection</article-title>
<source>J. Biol. Chem.</source>
<year>2004</year>
<volume>279</volume>
<fpage>49055</fpage>
<lpage>49063</lpage>
</nlm-citation>
</ref>
<ref id="B26">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kiefer</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Retention of fibroblast growth factor 3 in the Golgi complex may regulate its export from cells</article-title>
<source>Mol. Cell Biol.</source>
<year>1993</year>
<volume>13</volume>
<fpage>5781</fpage>
<lpage>5793</lpage>
</nlm-citation>
</ref>
<ref id="B27">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kiemer</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Coronavirus 3CLpro proteinase cleavage sites: possible relevance to SARS virus pathology</article-title>
<source>BMC Bioinformatics</source>
<year>2004</year>
<volume>5</volume>
<fpage>72</fpage>
</nlm-citation>
</ref>
<ref id="B28">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kowalski</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Functional regions of the envelope glycoprotein of human immunodeficiency virus type 1</article-title>
<source>Science</source>
<year>1987</year>
<volume>237</volume>
<fpage>1351</fpage>
<lpage>1355</lpage>
</nlm-citation>
</ref>
<ref id="B29">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McCune</surname>
<given-names>JM</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Endoproteolytic cleavage of gp160 is required for the activation of human immunodeficiency virus</article-title>
<source>Cell</source>
<year>1988</year>
<volume>53</volume>
<fpage>55</fpage>
<lpage>67</lpage>
</nlm-citation>
</ref>
<ref id="B30">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nakayama</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Furin: a mammalian subtilisin/Kex2p-like endoprotease involved in processing of a wide variety of precursor proteins</article-title>
<source>Biochem. J.</source>
<year>1997</year>
<volume>327</volume>
<fpage>625</fpage>
<lpage>635</lpage>
</nlm-citation>
</ref>
<ref id="B31">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nickel</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>The mystery of nonclassical protein secretion. A current view on cargo proteins and potential export routes</article-title>
<source>Eur. J. Biochem.</source>
<year>2003</year>
<volume>270</volume>
<fpage>2109</fpage>
<lpage>2119</lpage>
</nlm-citation>
</ref>
<ref id="B32">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paetzel</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Signal peptidases</article-title>
<source>Chem. Rev.</source>
<year>2002</year>
<volume>102</volume>
<fpage>4549</fpage>
<lpage>4580</lpage>
</nlm-citation>
</ref>
<ref id="B33">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qian</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Sejnowski</surname>
<given-names>TJ</given-names>
</name>
</person-group>
<article-title>Predicting the secondary structure of globular proteins using neural network models</article-title>
<source>J. Mol. Biol.</source>
<year>1988</year>
<volume>202</volume>
<fpage>865</fpage>
<lpage>884</lpage>
</nlm-citation>
</ref>
<ref id="B34">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Seidah</surname>
<given-names>NG</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Precursor convertases: an evolutionary ancient, cell-specific, combinatorial mechanism yielding diverse bioactive peptides and proteins</article-title>
<source>Ann. NY Acad. Sci.</source>
<year>1998</year>
<volume>839</volume>
<fpage>9</fpage>
<lpage>24</lpage>
</nlm-citation>
</ref>
<ref id="B35">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shimada</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Mutant FGF-23 responsible for autosomal dominant hypophosphatemic rickets is resistant to proteolytic cleavage and causes hypophosphatemia
<italic>in vivo</italic>
</article-title>
<source>Endocrinology</source>
<year>2002</year>
<volume>143</volume>
<fpage>3179</fpage>
<lpage>3182</lpage>
</nlm-citation>
</ref>
<ref id="B36">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vapnik</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Cortes</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Support vector networks</article-title>
<source>Machine Learning</source>
<year>1995</year>
<volume>20</volume>
<fpage>1</fpage>
<lpage>25</lpage>
</nlm-citation>
</ref>
<ref id="B37">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>White</surname>
<given-names>KE</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Autosomal-dominant hypophosphatemic rickets (ADHR) mutations stabilize FGF-23</article-title>
<source>Kidney Int.</source>
<year>2001</year>
<volume>60</volume>
<fpage>2079</fpage>
<lpage>2086</lpage>
</nlm-citation>
</ref>
<ref id="B38">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>White</surname>
<given-names>KE</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Autosomal dominant hypophosphataemic rickets is associated with mutations in FGF23</article-title>
<source>Nat. Genet.</source>
<year>2000</year>
<volume>26</volume>
<fpage>345</fpage>
</nlm-citation>
</ref>
<ref id="B39">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>ZR</given-names>
</name>
<name>
<surname>Berry</surname>
<given-names>EA</given-names>
</name>
</person-group>
<article-title>Reduced bio-basis function neural networks for protease cleavage site prediction</article-title>
<source>J. Bioinform. Comput. Biol.</source>
<year>2004</year>
<volume>2</volume>
<fpage>511</fpage>
<lpage>531</lpage>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo>
<title>Predicting proteolytic sites in extracellular proteins: only halfway there</title>
</titleInfo>
<titleInfo type="alternative" contentType="CDATA">
<title>Predicting proteolytic sites in extracellular proteins: only halfway there</title>
</titleInfo>
<name type="personal" displayLabel="corresp">
<namePart type="given">Yossef</namePart>
<namePart type="family">Kliger</namePart>
<affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
<affiliation>*To whom correspondence should be addressed.</affiliation>
<description>The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.</description>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Eyal</namePart>
<namePart type="family">Gofer</namePart>
<affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
<description>The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.</description>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Assaf</namePart>
<namePart type="family">Wool</namePart>
<affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Amir</namePart>
<namePart type="family">Toporik</namePart>
<affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Avihay</namePart>
<namePart type="family">Apatoff</namePart>
<affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Moshe</namePart>
<namePart type="family">Olshansky</namePart>
<affiliation>Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512 and The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="research-article" displayLabel="research-article" authority="ISTEX" authorityURI="https://content-type.data.istex.fr" valueURI="https://content-type.data.istex.fr/ark:/67375/XTP-1JC4F85T-7">research-article</genre>
<originInfo>
<publisher>Oxford University Press</publisher>
<dateIssued encoding="w3cdtf">2008-04-15</dateIssued>
<dateCreated encoding="w3cdtf">2008-03-01</dateCreated>
<copyrightDate encoding="w3cdtf">2008</copyrightDate>
</originInfo>
<abstract>Motivation: Many secretory proteins are synthesized as inactive precursors that must undergo post-translational proteolysis in order to mature and become active. In the current study, we address the challenge of sequence-based discovery of proteolytic sites in secreted proteins using machine learning. Results: The results revealed that only half of the extracellular proteolytic sites are currently annotated, leaving over 3600 unannotated ones. Furthermore, we have found that only 6% of the unannotated sites are similar to known proteolytic sites, whereas the remaining 94% do not share significant similarity with any annotated proteolytic site. The computational challenges in these two cases are very different. While the precision in detecting the former group is close to perfect, only a mere 22% of the latter group were detected with a precision of 80%. The applicability of the classifier is demonstrated through members of the FGF family, in which we verified the conservation of physiologically-relevant proteolytic sites in homologous proteins. Contact: kliger@compugen.co.il; yossef.kliger@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.</abstract>
<note type="footnotes">Associate Editor: John Quackenbush</note>
<note type="author-notes">*To whom correspondence should be addressed.</note>
<relatedItem type="host">
<titleInfo>
<title>Bioinformatics</title>
</titleInfo>
<genre type="journal" authority="ISTEX" authorityURI="https://publication-type.data.istex.fr" valueURI="https://publication-type.data.istex.fr/ark:/67375/JMC-0GLKJH51-B">journal</genre>
<subject>
<topic>ORIGINAL PAPERS</topic>
</subject>
<subject>
<topic>SEQUENCE ANALYSIS</topic>
</subject>
<identifier type="ISSN">1367-4803</identifier>
<identifier type="eISSN">1460-2059</identifier>
<identifier type="PublisherID">bioinformatics</identifier>
<identifier type="PublisherID-hwp">bioinfo</identifier>
<part>
<date>2008</date>
<detail type="volume">
<caption>vol.</caption>
<number>24</number>
</detail>
<detail type="issue">
<caption>no.</caption>
<number>8</number>
</detail>
<extent unit="pages">
<start>1049</start>
<end>1055</end>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B1">
<titleInfo>
<title>The human plasma proteome: a nonredundant list developed by combination of four separate sources</title>
</titleInfo>
<name type="personal">
<namePart type="given">NL</namePart>
<namePart type="family">Anderson</namePart>
</name>
<genre>journal</genre>
<note>AndersonNLThe human plasma proteome: a nonredundant list developed by combination of four separate sourcesMol. Cell Proteomics20043311326</note>
<relatedItem type="host">
<titleInfo>
<title>Mol. Cell Proteomics</title>
</titleInfo>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>3</number>
</detail>
<extent unit="pages">
<start>311</start>
<end>326</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B2">
<titleInfo>
<title>NH2-terminal cleavage of xenopus fibroblast growth factor 3 is necessary for optimal biological activity and receptor binding</title>
</titleInfo>
<name type="personal">
<namePart type="given">M</namePart>
<namePart type="family">Antoine</namePart>
</name>
<genre>journal</genre>
<note>AntoineMNH2-terminal cleavage of xenopus fibroblast growth factor 3 is necessary for optimal biological activity and receptor bindingCell Growth Differ.200011593605</note>
<relatedItem type="host">
<titleInfo>
<title>Cell Growth Differ.</title>
</titleInfo>
<part>
<date>2000</date>
<detail type="volume">
<caption>vol.</caption>
<number>11</number>
</detail>
<extent unit="pages">
<start>593</start>
<end>605</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B3">
<titleInfo>
<title>Effects of L- and D-REKR amino acid-containing peptides on HIV and SIV envelope glycoprotein precursor maturation and HIV and SIV replication</title>
</titleInfo>
<name type="personal">
<namePart type="given">B</namePart>
<namePart type="family">Bahbouhi</namePart>
</name>
<genre>journal</genre>
<note>BahbouhiBEffects of L- and D-REKR amino acid-containing peptides on HIV and SIV envelope glycoprotein precursor maturation and HIV and SIV replicationBiochem. J.2002366863872</note>
<relatedItem type="host">
<titleInfo>
<title>Biochem. J.</title>
</titleInfo>
<part>
<date>2002</date>
<detail type="volume">
<caption>vol.</caption>
<number>366</number>
</detail>
<extent unit="pages">
<start>863</start>
<end>872</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B4">
<titleInfo>
<title>Inhibitors of proprotein convertases</title>
</titleInfo>
<name type="personal">
<namePart type="given">A</namePart>
<namePart type="family">Basak</namePart>
</name>
<genre>journal</genre>
<note>BasakAInhibitors of proprotein convertasesJ. Mol. Med.200583844855</note>
<relatedItem type="host">
<titleInfo>
<title>J. Mol. Med.</title>
</titleInfo>
<part>
<date>2005</date>
<detail type="volume">
<caption>vol.</caption>
<number>83</number>
</detail>
<extent unit="pages">
<start>844</start>
<end>855</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B5">
<titleInfo>
<title>Improved prediction of signal peptides: SignalP 3.0</title>
</titleInfo>
<name type="personal">
<namePart type="given">JD</namePart>
<namePart type="family">Bendtsen</namePart>
</name>
<genre>journal</genre>
<note>BendtsenJDImproved prediction of signal peptides: SignalP 3.0J. Mol. Biol.2004340783795</note>
<relatedItem type="host">
<titleInfo>
<title>J. Mol. Biol.</title>
</titleInfo>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>340</number>
</detail>
<extent unit="pages">
<start>783</start>
<end>795</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B6">
<titleInfo>
<title>Implication of proprotein convertases in the processing and spread of severe acute respiratory syndrome coronavirus</title>
</titleInfo>
<name type="personal">
<namePart type="given">E</namePart>
<namePart type="family">Bergeron</namePart>
</name>
<genre>journal</genre>
<note>BergeronEImplication of proprotein convertases in the processing and spread of severe acute respiratory syndrome coronavirusBiochem. Biophys. Res. Commun.2005326554563</note>
<relatedItem type="host">
<titleInfo>
<title>Biochem. Biophys. Res. Commun.</title>
</titleInfo>
<part>
<date>2005</date>
<detail type="volume">
<caption>vol.</caption>
<number>326</number>
</detail>
<extent unit="pages">
<start>554</start>
<end>563</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B7">
<titleInfo>
<title>Cleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networks</title>
</titleInfo>
<name type="personal">
<namePart type="given">N</namePart>
<namePart type="family">Blom</namePart>
</name>
<genre>journal</genre>
<note>BlomNCleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networksProtein Sci.1996522032216</note>
<relatedItem type="host">
<titleInfo>
<title>Protein Sci.</title>
</titleInfo>
<part>
<date>1996</date>
<detail type="volume">
<caption>vol.</caption>
<number>5</number>
</detail>
<extent unit="pages">
<start>2203</start>
<end>2216</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B8">
<titleInfo>
<title>The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003</title>
</titleInfo>
<name type="personal">
<namePart type="given">B</namePart>
<namePart type="family">Boeckmann</namePart>
</name>
<genre>journal</genre>
<note>BoeckmannBThe SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003Nucleic Acids Res.200331365370</note>
<relatedItem type="host">
<titleInfo>
<title>Nucleic Acids Res.</title>
</titleInfo>
<part>
<date>2003</date>
<detail type="volume">
<caption>vol.</caption>
<number>31</number>
</detail>
<extent unit="pages">
<start>365</start>
<end>370</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B9">
<titleInfo>
<title>FGF-23 Inhibits Renal Tubular Phosphate Transport and Is a PHEX Substrate</title>
</titleInfo>
<name type="personal">
<namePart type="given">AE</namePart>
<namePart type="family">Bowe</namePart>
</name>
<genre>journal</genre>
<note>BoweAEFGF-23 Inhibits Renal Tubular Phosphate Transport and Is a PHEX SubstrateBiochem. Biophys. Res. Commun.2001284977</note>
<relatedItem type="host">
<titleInfo>
<title>Biochem. Biophys. Res. Commun.</title>
</titleInfo>
<part>
<date>2001</date>
<detail type="volume">
<caption>vol.</caption>
<number>284</number>
</detail>
<extent unit="pages">
<start>977</start>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B10">
<titleInfo>
<title>Mechanism of C-terminal amide formation by pituitary enzymes</title>
</titleInfo>
<name type="personal">
<namePart type="given">AF</namePart>
<namePart type="family">Bradbury</namePart>
</name>
<genre>journal</genre>
<note>BradburyAFMechanism of C-terminal amide formation by pituitary enzymesNature1982298686688</note>
<relatedItem type="host">
<titleInfo>
<title>Nature</title>
</titleInfo>
<part>
<date>1982</date>
<detail type="volume">
<caption>vol.</caption>
<number>298</number>
</detail>
<extent unit="pages">
<start>686</start>
<end>688</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B11">
<titleInfo>
<title>Random forests</title>
</titleInfo>
<name type="personal">
<namePart type="given">L</namePart>
<namePart type="family">Breiman</namePart>
</name>
<genre>journal</genre>
<note>BreimanLRandom forestsMachine Learning200145532</note>
<relatedItem type="host">
<titleInfo>
<title>Machine Learning</title>
</titleInfo>
<part>
<date>2001</date>
<detail type="volume">
<caption>vol.</caption>
<number>45</number>
</detail>
<extent unit="pages">
<start>5</start>
<end>32</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B12">
<titleInfo>
<title>Artificial neural network method for predicting HIV protease cleavage sites in protein</title>
</titleInfo>
<name type="personal">
<namePart type="given">YD</namePart>
<namePart type="family">Cai</namePart>
</name>
<genre>journal</genre>
<note>CaiYDArtificial neural network method for predicting HIV protease cleavage sites in proteinJ. Protein Chem.199817607615</note>
<relatedItem type="host">
<titleInfo>
<title>J. Protein Chem.</title>
</titleInfo>
<part>
<date>1998</date>
<detail type="volume">
<caption>vol.</caption>
<number>17</number>
</detail>
<extent unit="pages">
<start>607</start>
<end>615</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B13">
<titleInfo>
<title>The Jalview Java alignment editor</title>
</titleInfo>
<name type="personal">
<namePart type="given">M</namePart>
<namePart type="family">Clamp</namePart>
</name>
<genre>journal</genre>
<note>ClampMThe Jalview Java alignment editorBioinformatics200420426427</note>
<relatedItem type="host">
<titleInfo>
<title>Bioinformatics</title>
</titleInfo>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>20</number>
</detail>
<extent unit="pages">
<start>426</start>
<end>427</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B14">
<titleInfo>
<title>Prodynorphin processing by proprotein convertase 2. Cleavage at single basic residues and enhanced processing in the presence of carboxypeptidase activity</title>
</titleInfo>
<name type="personal">
<namePart type="given">R</namePart>
<namePart type="family">Day</namePart>
</name>
<genre>journal</genre>
<note>DayRProdynorphin processing by proprotein convertase 2. Cleavage at single basic residues and enhanced processing in the presence of carboxypeptidase activityJ. Biol. Chem.1998273829836</note>
<relatedItem type="host">
<titleInfo>
<title>J. Biol. Chem.</title>
</titleInfo>
<part>
<date>1998</date>
<detail type="volume">
<caption>vol.</caption>
<number>273</number>
</detail>
<extent unit="pages">
<start>829</start>
<end>836</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B15">
<titleInfo>
<title>Cleavage inhibition of the murine coronavirus spike protein by a furin-like enzyme affects cell–cell but not virus–cell fusion</title>
</titleInfo>
<name type="personal">
<namePart type="given">CA</namePart>
<namePart type="family">de Haan</namePart>
</name>
<genre>journal</genre>
<note>de HaanCACleavage inhibition of the murine coronavirus spike protein by a furin-like enzyme affects cell–cell but not virus–cell fusionJ. Virol.20047860486054</note>
<relatedItem type="host">
<titleInfo>
<title>J. Virol.</title>
</titleInfo>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>78</number>
</detail>
<extent unit="pages">
<start>6048</start>
<end>6054</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B16">
<titleInfo>
<title>ProbCons: Probabilistic consistency-based multiple sequence alignment</title>
</titleInfo>
<name type="personal">
<namePart type="given">CB</namePart>
<namePart type="family">Do</namePart>
</name>
<genre>journal</genre>
<note>DoCBProbCons: Probabilistic consistency-based multiple sequence alignmentGenome Res.200515330340</note>
<relatedItem type="host">
<titleInfo>
<title>Genome Res.</title>
</titleInfo>
<part>
<date>2005</date>
<detail type="volume">
<caption>vol.</caption>
<number>15</number>
</detail>
<extent unit="pages">
<start>330</start>
<end>340</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B17">
<titleInfo>
<title>Prediction of proprotein convertase cleavage sites</title>
</titleInfo>
<name type="personal">
<namePart type="given">P</namePart>
<namePart type="family">Duckert</namePart>
</name>
<genre>journal</genre>
<note>DuckertPPrediction of proprotein convertase cleavage sitesProtein Eng. Des. Sel.200417107112</note>
<relatedItem type="host">
<titleInfo>
<title>Protein Eng. Des. Sel.</title>
</titleInfo>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>17</number>
</detail>
<extent unit="pages">
<start>107</start>
<end>112</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B18">
<titleInfo>
<title>Biological and immunological properties of human immunodeficiency virus type 1 envelope glycoprotein: analysis of proteins with truncations and deletions expressed by recombinant vaccinia viruses</title>
</titleInfo>
<name type="personal">
<namePart type="given">PL</namePart>
<namePart type="family">Earl</namePart>
</name>
<genre>journal</genre>
<note>EarlPLBiological and immunological properties of human immunodeficiency virus type 1 envelope glycoprotein: analysis of proteins with truncations and deletions expressed by recombinant vaccinia virusesJ. Virol.1991653141</note>
<relatedItem type="host">
<titleInfo>
<title>J. Virol.</title>
</titleInfo>
<part>
<date>1991</date>
<detail type="volume">
<caption>vol.</caption>
<number>65</number>
</detail>
<extent unit="pages">
<start>31</start>
<end>41</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B19">
<titleInfo>
<title>Annotation of post-translational modifications in the Swiss-Prot knowledge base</title>
</titleInfo>
<name type="personal">
<namePart type="given">N</namePart>
<namePart type="family">Farriol-Mathis</namePart>
</name>
<genre>journal</genre>
<note>Farriol-MathisNAnnotation of post-translational modifications in the Swiss-Prot knowledge baseProteomics2004415371550</note>
<relatedItem type="host">
<titleInfo>
<title>Proteomics</title>
</titleInfo>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>4</number>
</detail>
<extent unit="pages">
<start>1537</start>
<end>1550</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B20">
<titleInfo>
<title>Attenuated processing of proglucagon and glucagon-like peptide-1 in carboxypeptidase E-deficient mice</title>
</titleInfo>
<name type="personal">
<namePart type="given">L</namePart>
<namePart type="family">Friis-Hansen</namePart>
</name>
<genre>journal</genre>
<note>Friis-HansenLAttenuated processing of proglucagon and glucagon-like peptide-1 in carboxypeptidase E-deficient miceJ. Endocrinol.2001169595602</note>
<relatedItem type="host">
<titleInfo>
<title>J. Endocrinol.</title>
</titleInfo>
<part>
<date>2001</date>
<detail type="volume">
<caption>vol.</caption>
<number>169</number>
</detail>
<extent unit="pages">
<start>595</start>
<end>602</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B21">
<titleInfo>
<title>Inhibition of furin-mediated cleavage activation of HIV-1 glycoprotein gp160</title>
</titleInfo>
<name type="personal">
<namePart type="given">S</namePart>
<namePart type="family">Hallenberger</namePart>
</name>
<genre>journal</genre>
<note>HallenbergerSInhibition of furin-mediated cleavage activation of HIV-1 glycoprotein gp160Nature1992360358361</note>
<relatedItem type="host">
<titleInfo>
<title>Nature</title>
</titleInfo>
<part>
<date>1992</date>
<detail type="volume">
<caption>vol.</caption>
<number>360</number>
</detail>
<extent unit="pages">
<start>358</start>
<end>361</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B22">
<titleInfo>
<title>Selection of representative protein data sets</title>
</titleInfo>
<name type="personal">
<namePart type="given">U</namePart>
<namePart type="family">Hobohm</namePart>
</name>
<genre>journal</genre>
<note>HobohmUSelection of representative protein data setsProtein Sci.19921409417</note>
<relatedItem type="host">
<titleInfo>
<title>Protein Sci.</title>
</titleInfo>
<part>
<date>1992</date>
<detail type="volume">
<caption>vol.</caption>
<number>1</number>
</detail>
<extent unit="pages">
<start>409</start>
<end>417</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B23">
<titleInfo>
<title>Making large-scale support vector machine learning practical</title>
</titleInfo>
<name type="personal">
<namePart type="given">T</namePart>
<namePart type="family">Joachims</namePart>
</name>
<name type="personal">
<namePart type="given">B</namePart>
<namePart type="family">Schölkopf</namePart>
</name>
<genre>book</genre>
<note>JoachimsTSchölkopfBMaking large-scale support vector machine learning practicalAdvances in Kernel Methods – Support Vector Learning ch. 11,1999Cambridge, USAMIT Press169184</note>
<relatedItem type="host">
<titleInfo>
<title>Advances in Kernel Methods – Support Vector Learning ch. 11,</title>
</titleInfo>
<originInfo>
<publisher>MIT Press. </publisher>
<place>
<placeTerm type="text">Cambridge, USA</placeTerm>
</place>
</originInfo>
<part>
<date>1999</date>
<extent unit="pages">
<start>169</start>
<end>184</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B24">
<titleInfo>
<title>Representation of functional information in the SWISS-PROT data bank</title>
</titleInfo>
<name type="personal">
<namePart type="given">VL</namePart>
<namePart type="family">Junker</namePart>
</name>
<genre>journal</genre>
<note>JunkerVLRepresentation of functional information in the SWISS-PROT data bankBioinformatics19991510661067</note>
<relatedItem type="host">
<titleInfo>
<title>Bioinformatics</title>
</titleInfo>
<part>
<date>1999</date>
<detail type="volume">
<caption>vol.</caption>
<number>15</number>
</detail>
<extent unit="pages">
<start>1066</start>
<end>1067</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B25">
<titleInfo>
<title>Polyarginine inhibits gp160 processing by furin and suppresses productive human immunodeficiency virus type 1 infection</title>
</titleInfo>
<name type="personal">
<namePart type="given">KV</namePart>
<namePart type="family">Kibler</namePart>
</name>
<genre>journal</genre>
<note>KiblerKVPolyarginine inhibits gp160 processing by furin and suppresses productive human immunodeficiency virus type 1 infectionJ. Biol. Chem.20042794905549063</note>
<relatedItem type="host">
<titleInfo>
<title>J. Biol. Chem.</title>
</titleInfo>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>279</number>
</detail>
<extent unit="pages">
<start>49055</start>
<end>49063</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B26">
<titleInfo>
<title>Retention of fibroblast growth factor 3 in the Golgi complex may regulate its export from cells</title>
</titleInfo>
<name type="personal">
<namePart type="given">P</namePart>
<namePart type="family">Kiefer</namePart>
</name>
<genre>journal</genre>
<note>KieferPRetention of fibroblast growth factor 3 in the Golgi complex may regulate its export from cellsMol. Cell Biol.19931357815793</note>
<relatedItem type="host">
<titleInfo>
<title>Mol. Cell Biol.</title>
</titleInfo>
<part>
<date>1993</date>
<detail type="volume">
<caption>vol.</caption>
<number>13</number>
</detail>
<extent unit="pages">
<start>5781</start>
<end>5793</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B27">
<titleInfo>
<title>Coronavirus 3CLpro proteinase cleavage sites: possible relevance to SARS virus pathology</title>
</titleInfo>
<name type="personal">
<namePart type="given">L</namePart>
<namePart type="family">Kiemer</namePart>
</name>
<genre>journal</genre>
<note>KiemerLCoronavirus 3CLpro proteinase cleavage sites: possible relevance to SARS virus pathologyBMC Bioinformatics2004572</note>
<relatedItem type="host">
<titleInfo>
<title>BMC Bioinformatics</title>
</titleInfo>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>5</number>
</detail>
<extent unit="pages">
<start>72</start>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B28">
<titleInfo>
<title>Functional regions of the envelope glycoprotein of human immunodeficiency virus type 1</title>
</titleInfo>
<name type="personal">
<namePart type="given">M</namePart>
<namePart type="family">Kowalski</namePart>
</name>
<genre>journal</genre>
<note>KowalskiMFunctional regions of the envelope glycoprotein of human immunodeficiency virus type 1Science198723713511355</note>
<relatedItem type="host">
<titleInfo>
<title>Science</title>
</titleInfo>
<part>
<date>1987</date>
<detail type="volume">
<caption>vol.</caption>
<number>237</number>
</detail>
<extent unit="pages">
<start>1351</start>
<end>1355</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B29">
<titleInfo>
<title>Endoproteolytic cleavage of gp160 is required for the activation of human immunodeficiency virus</title>
</titleInfo>
<name type="personal">
<namePart type="given">JM</namePart>
<namePart type="family">McCune</namePart>
</name>
<genre>journal</genre>
<note>McCuneJMEndoproteolytic cleavage of gp160 is required for the activation of human immunodeficiency virusCell1988535567</note>
<relatedItem type="host">
<titleInfo>
<title>Cell</title>
</titleInfo>
<part>
<date>1988</date>
<detail type="volume">
<caption>vol.</caption>
<number>53</number>
</detail>
<extent unit="pages">
<start>55</start>
<end>67</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B30">
<titleInfo>
<title>Furin: a mammalian subtilisin/Kex2p-like endoprotease involved in processing of a wide variety of precursor proteins</title>
</titleInfo>
<name type="personal">
<namePart type="given">K</namePart>
<namePart type="family">Nakayama</namePart>
</name>
<genre>journal</genre>
<note>NakayamaKFurin: a mammalian subtilisin/Kex2p-like endoprotease involved in processing of a wide variety of precursor proteinsBiochem. J.1997327625635</note>
<relatedItem type="host">
<titleInfo>
<title>Biochem. J.</title>
</titleInfo>
<part>
<date>1997</date>
<detail type="volume">
<caption>vol.</caption>
<number>327</number>
</detail>
<extent unit="pages">
<start>625</start>
<end>635</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B31">
<titleInfo>
<title>The mystery of nonclassical protein secretion. A current view on cargo proteins and potential export routes</title>
</titleInfo>
<name type="personal">
<namePart type="given">W</namePart>
<namePart type="family">Nickel</namePart>
</name>
<genre>journal</genre>
<note>NickelWThe mystery of nonclassical protein secretion. A current view on cargo proteins and potential export routesEur. J. Biochem.200327021092119</note>
<relatedItem type="host">
<titleInfo>
<title>Eur. J. Biochem.</title>
</titleInfo>
<part>
<date>2003</date>
<detail type="volume">
<caption>vol.</caption>
<number>270</number>
</detail>
<extent unit="pages">
<start>2109</start>
<end>2119</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B32">
<titleInfo>
<title>Signal peptidases</title>
</titleInfo>
<name type="personal">
<namePart type="given">M</namePart>
<namePart type="family">Paetzel</namePart>
</name>
<genre>journal</genre>
<note>PaetzelMSignal peptidasesChem. Rev.200210245494580</note>
<relatedItem type="host">
<titleInfo>
<title>Chem. Rev.</title>
</titleInfo>
<part>
<date>2002</date>
<detail type="volume">
<caption>vol.</caption>
<number>102</number>
</detail>
<extent unit="pages">
<start>4549</start>
<end>4580</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B33">
<titleInfo>
<title>Predicting the secondary structure of globular proteins using neural network models</title>
</titleInfo>
<name type="personal">
<namePart type="given">N</namePart>
<namePart type="family">Qian</namePart>
</name>
<name type="personal">
<namePart type="given">TJ</namePart>
<namePart type="family">Sejnowski</namePart>
</name>
<genre>journal</genre>
<note>QianNSejnowskiTJPredicting the secondary structure of globular proteins using neural network modelsJ. Mol. Biol.1988202865884</note>
<relatedItem type="host">
<titleInfo>
<title>J. Mol. Biol.</title>
</titleInfo>
<part>
<date>1988</date>
<detail type="volume">
<caption>vol.</caption>
<number>202</number>
</detail>
<extent unit="pages">
<start>865</start>
<end>884</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B34">
<titleInfo>
<title>Precursor convertases: an evolutionary ancient, cell-specific, combinatorial mechanism yielding diverse bioactive peptides and proteins</title>
</titleInfo>
<name type="personal">
<namePart type="given">NG</namePart>
<namePart type="family">Seidah</namePart>
</name>
<genre>journal</genre>
<note>SeidahNGPrecursor convertases: an evolutionary ancient, cell-specific, combinatorial mechanism yielding diverse bioactive peptides and proteinsAnn. NY Acad. Sci.1998839924</note>
<relatedItem type="host">
<titleInfo>
<title>Ann. NY Acad. Sci.</title>
</titleInfo>
<part>
<date>1998</date>
<detail type="volume">
<caption>vol.</caption>
<number>839</number>
</detail>
<extent unit="pages">
<start>9</start>
<end>24</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B35">
<titleInfo>
<title>Mutant FGF-23 responsible for autosomal dominant hypophosphatemic rickets is resistant to proteolytic cleavage and causes hypophosphatemia in vivo</title>
</titleInfo>
<name type="personal">
<namePart type="given">T</namePart>
<namePart type="family">Shimada</namePart>
</name>
<genre>journal</genre>
<note>ShimadaTMutant FGF-23 responsible for autosomal dominant hypophosphatemic rickets is resistant to proteolytic cleavage and causes hypophosphatemia in vivoEndocrinology200214331793182</note>
<relatedItem type="host">
<titleInfo>
<title>Endocrinology</title>
</titleInfo>
<part>
<date>2002</date>
<detail type="volume">
<caption>vol.</caption>
<number>143</number>
</detail>
<extent unit="pages">
<start>3179</start>
<end>3182</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B36">
<titleInfo>
<title>Support vector networks</title>
</titleInfo>
<name type="personal">
<namePart type="given">V</namePart>
<namePart type="family">Vapnik</namePart>
</name>
<name type="personal">
<namePart type="given">C</namePart>
<namePart type="family">Cortes</namePart>
</name>
<genre>journal</genre>
<note>VapnikVCortesCSupport vector networksMachine Learning199520125</note>
<relatedItem type="host">
<titleInfo>
<title>Machine Learning</title>
</titleInfo>
<part>
<date>1995</date>
<detail type="volume">
<caption>vol.</caption>
<number>20</number>
</detail>
<extent unit="pages">
<start>1</start>
<end>25</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B37">
<titleInfo>
<title>Autosomal-dominant hypophosphatemic rickets (ADHR) mutations stabilize FGF-23</title>
</titleInfo>
<name type="personal">
<namePart type="given">KE</namePart>
<namePart type="family">White</namePart>
</name>
<genre>journal</genre>
<note>WhiteKEAutosomal-dominant hypophosphatemic rickets (ADHR) mutations stabilize FGF-23Kidney Int.20016020792086</note>
<relatedItem type="host">
<titleInfo>
<title>Kidney Int.</title>
</titleInfo>
<part>
<date>2001</date>
<detail type="volume">
<caption>vol.</caption>
<number>60</number>
</detail>
<extent unit="pages">
<start>2079</start>
<end>2086</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B38">
<titleInfo>
<title>Autosomal dominant hypophosphataemic rickets is associated with mutations in FGF23</title>
</titleInfo>
<name type="personal">
<namePart type="given">KE</namePart>
<namePart type="family">White</namePart>
</name>
<genre>journal</genre>
<note>WhiteKEAutosomal dominant hypophosphataemic rickets is associated with mutations in FGF23Nat. Genet.200026345</note>
<relatedItem type="host">
<titleInfo>
<title>Nat. Genet.</title>
</titleInfo>
<part>
<date>2000</date>
<detail type="volume">
<caption>vol.</caption>
<number>26</number>
</detail>
<extent unit="pages">
<start>345</start>
</extent>
</part>
</relatedItem>
</relatedItem>
<relatedItem type="references" displayLabel="B39">
<titleInfo>
<title>Reduced bio-basis function neural networks for protease cleavage site prediction</title>
</titleInfo>
<name type="personal">
<namePart type="given">ZR</namePart>
<namePart type="family">Yang</namePart>
</name>
<name type="personal">
<namePart type="given">EA</namePart>
<namePart type="family">Berry</namePart>
</name>
<genre>journal</genre>
<note>YangZRBerryEAReduced bio-basis function neural networks for protease cleavage site predictionJ. Bioinform. Comput. Biol.20042511531</note>
<relatedItem type="host">
<titleInfo>
<title>J. Bioinform. Comput. Biol.</title>
</titleInfo>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>2</number>
</detail>
<extent unit="pages">
<start>511</start>
<end>531</end>
</extent>
</part>
</relatedItem>
</relatedItem>
<identifier type="istex">B0C4B4C1EC355D0EB86F236BEA4E417647565B77</identifier>
<identifier type="ark">ark:/67375/HXZ-RQ5WHZPQ-4</identifier>
<identifier type="DOI">10.1093/bioinformatics/btn084</identifier>
<identifier type="ArticleID">btn084</identifier>
<accessCondition type="use and reproduction" contentType="copyright">© 2008 The Author(s)</accessCondition>
<recordInfo>
<recordContentSource authority="ISTEX" authorityURI="https://loaded-corpus.data.istex.fr" valueURI="https://loaded-corpus.data.istex.fr/ark:/67375/XBH-GTWS0RDP-M">oup</recordContentSource>
<recordOrigin>Converted from (version 1.2.10) to MODS version 3.6.</recordOrigin>
<recordCreationDate encoding="w3cdtf">2020-04-16</recordCreationDate>
</recordInfo>
</mods>
<json:item>
<extension>json</extension>
<original>false</original>
<mimetype>application/json</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-RQ5WHZPQ-4/record.json</uri>
</json:item>
</metadata>
<covers>
<json:item>
<extension>tiff</extension>
<original>true</original>
<mimetype>image/tiff</mimetype>
<uri>https://api.istex.fr/document/B0C4B4C1EC355D0EB86F236BEA4E417647565B77/covers/tiff</uri>
</json:item>
</covers>
<annexes>
<json:item>
<extension>gif</extension>
<original>true</original>
<mimetype>image/gif</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-RQ5WHZPQ-4/annexes.gif</uri>
</json:item>
<json:item>
<extension>jpeg</extension>
<original>true</original>
<mimetype>image/jpeg</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-RQ5WHZPQ-4/annexes.jpeg</uri>
</json:item>
<json:item>
<extension>pdf</extension>
<original>true</original>
<mimetype>application/pdf</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-RQ5WHZPQ-4/annexes.pdf</uri>
</json:item>
</annexes>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000848 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000848 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:B0C4B4C1EC355D0EB86F236BEA4E417647565B77
   |texte=   Predicting proteolytic sites in extracellular proteins: only halfway there
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021