TelematiV1, Pmc, Corpus, bibRecord, 000328

A detailed error analysis of 13 kernel methods for protein–protein interaction extraction

Identifieur interne : 000328 ( Pmc/Corpus ); précédent : 000327; suivant : 000329

A detailed error analysis of 13 kernel methods for protein–protein interaction extraction

Auteurs : Domonkos Tikk ; Illés Solt ; Philippe Thomas ; Ulf Leser

Source :

BMC Bioinformatics [ 1471-2105 ] ; 2013.

RBID : PMC:3680070

Abstract

Background

Kernel-based classification is the current state-of-the-art for extracting pairs of interacting proteins (PPIs) from free text. Various proposals have been put forward, which diverge especially in the specific kernel function, the type of input representation, and the feature sets. These proposals are regularly compared to each other regarding their overall performance on different gold standard corpora, but little is known about their respective performance on the instance level.

Results

We report on a detailed analysis of the shared characteristics and the differences between 13 current methods using five PPI corpora. We identified a large number of rather difficult (misclassified by most methods) and easy (correctly classified by most methods) PPIs. We show that kernels using the same input representation perform similarly on these pairs and that building ensembles using dissimilar kernels leads to significant performance gain. However, our analysis also reveals that characteristics shared between difficult pairs are few, which lowers the hope that new methods, if built along the same line as current ones, will deliver breakthroughs in extraction performance.

Conclusions

Our experiments show that current methods do not seem to do very well in capturing the shared characteristics of positive PPI pairs, which must also be attributed to the heterogeneity of the (still very few) available corpora. Our analysis suggests that performance improvements shall be sought after rather in novel feature sets than in novel kernel functions.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3680070

DOI: 10.1186/1471-2105-14-12
PubMed: 23323857
PubMed Central: 3680070

Links to Exploration step

PMC:3680070

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">A detailed error analysis of 13 kernel methods for protein–protein interaction extraction</title>
<author><name sortKey="Tikk, Domonkos" sort="Tikk, Domonkos" uniqKey="Tikk D" first="Domonkos" last="Tikk">Domonkos Tikk</name>
<affiliation><nlm:aff id="I1">Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, 10099 Berlin, Germany</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Software Engineering Institute, Óbuda University, 1034 Budapest, Hungary</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Solt, Illes" sort="Solt, Illes" uniqKey="Solt I" first="Illés" last="Solt">Illés Solt</name>
<affiliation><nlm:aff id="I1">Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, 10099 Berlin, Germany</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I3">Department of Telecommunications and Telematics, Budapest University of Technology and Economics, 1117 Budapest, Hungary</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Thomas, Philippe" sort="Thomas, Philippe" uniqKey="Thomas P" first="Philippe" last="Thomas">Philippe Thomas</name>
<affiliation><nlm:aff id="I1">Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, 10099 Berlin, Germany</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Leser, Ulf" sort="Leser, Ulf" uniqKey="Leser U" first="Ulf" last="Leser">Ulf Leser</name>
<affiliation><nlm:aff id="I1">Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, 10099 Berlin, Germany</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">23323857</idno>
<idno type="pmc">3680070</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3680070</idno>
<idno type="RBID">PMC:3680070</idno>
<idno type="doi">10.1186/1471-2105-14-12</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000328</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000328</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">A detailed error analysis of 13 kernel methods for protein–protein interaction extraction</title>
<author><name sortKey="Tikk, Domonkos" sort="Tikk, Domonkos" uniqKey="Tikk D" first="Domonkos" last="Tikk">Domonkos Tikk</name>
<affiliation><nlm:aff id="I1">Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, 10099 Berlin, Germany</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I2">Software Engineering Institute, Óbuda University, 1034 Budapest, Hungary</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Solt, Illes" sort="Solt, Illes" uniqKey="Solt I" first="Illés" last="Solt">Illés Solt</name>
<affiliation><nlm:aff id="I1">Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, 10099 Berlin, Germany</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="I3">Department of Telecommunications and Telematics, Budapest University of Technology and Economics, 1117 Budapest, Hungary</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Thomas, Philippe" sort="Thomas, Philippe" uniqKey="Thomas P" first="Philippe" last="Thomas">Philippe Thomas</name>
<affiliation><nlm:aff id="I1">Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, 10099 Berlin, Germany</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Leser, Ulf" sort="Leser, Ulf" uniqKey="Leser U" first="Ulf" last="Leser">Ulf Leser</name>
<affiliation><nlm:aff id="I1">Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, 10099 Berlin, Germany</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>Kernel-based classification is the current state-of-the-art for extracting pairs of interacting proteins (PPIs) from free text. Various proposals have been put forward, which diverge especially in the specific kernel function, the type of input representation, and the feature sets. These proposals are regularly compared to each other regarding their overall performance on different gold standard corpora, but little is known about their respective performance on the instance level.</p>
</sec>
<sec><title>Results</title>
<p>We report on a detailed analysis of the shared characteristics and the differences between 13 current methods using five PPI corpora. We identified a large number of rather difficult (misclassified by most methods) and easy (correctly classified by most methods) PPIs. We show that kernels using the same input representation perform similarly on these pairs and that building ensembles using dissimilar kernels leads to significant performance gain. However, our analysis also reveals that characteristics shared between difficult pairs are few, which lowers the hope that new methods, if built along the same line as current ones, will deliver breakthroughs in extraction performance.</p>
</sec>
<sec><title>Conclusions</title>
<p>Our experiments show that current methods do not seem to do very well in capturing the shared characteristics of positive PPI pairs, which must also be attributed to the heterogeneity of the (still very few) available corpora. Our analysis suggests that performance improvements shall be sought after rather in novel feature sets than in novel kernel functions.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Blaschke, C" uniqKey="Blaschke C">C Blaschke</name>
</author>
<author><name sortKey="Andrade, Ma" uniqKey="Andrade M">MA Andrade</name>
</author>
<author><name sortKey="Ouzounis, C" uniqKey="Ouzounis C">C Ouzounis</name>
</author>
<author><name sortKey="Valencia, A" uniqKey="Valencia A">A Valencia</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ono, T" uniqKey="Ono T">T Ono</name>
</author>
<author><name sortKey="Hishigaki, H" uniqKey="Hishigaki H">H Hishigaki</name>
</author>
<author><name sortKey="Tanigami, A" uniqKey="Tanigami A">A Tanigami</name>
</author>
<author><name sortKey="Takagi, T" uniqKey="Takagi T">T Takagi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Marcotte, Em" uniqKey="Marcotte E">EM Marcotte</name>
</author>
<author><name sortKey="Xenarios, I" uniqKey="Xenarios I">I Xenarios</name>
</author>
<author><name sortKey="Eisenberg, D" uniqKey="Eisenberg D">D Eisenberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Huang, M" uniqKey="Huang M">M Huang</name>
</author>
<author><name sortKey="Zhu, X" uniqKey="Zhu X">X Zhu</name>
</author>
<author><name sortKey="Hao, Y" uniqKey="Hao Y">Y Hao</name>
</author>
<author><name sortKey="Payan, Dg" uniqKey="Payan D">DG Payan</name>
</author>
<author><name sortKey="Qu, K" uniqKey="Qu K">K Qu</name>
</author>
<author><name sortKey="Li, M" uniqKey="Li M">M Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cohen, Am" uniqKey="Cohen A">AM Cohen</name>
</author>
<author><name sortKey="Hersh, Wr" uniqKey="Hersh W">WR Hersh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Krallinger, M" uniqKey="Krallinger M">M Krallinger</name>
</author>
<author><name sortKey="Valencia, A" uniqKey="Valencia A">A Valencia</name>
</author>
<author><name sortKey="Hirschman, L" uniqKey="Hirschman L">L Hirschman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhou, D" uniqKey="Zhou D">D Zhou</name>
</author>
<author><name sortKey="He, Y" uniqKey="He Y">Y He</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author><name sortKey="Airola, A" uniqKey="Airola A">A Airola</name>
</author>
<author><name sortKey="Heimonen, J" uniqKey="Heimonen J">J Heimonen</name>
</author>
<author><name sortKey="Bjorne, J" uniqKey="Bjorne J">J Björne</name>
</author>
<author><name sortKey="Ginter, F" uniqKey="Ginter F">F Ginter</name>
</author>
<author><name sortKey="Salakoski, T" uniqKey="Salakoski T">T Salakoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sarawagi, S" uniqKey="Sarawagi S">S Sarawagi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Haussler, D" uniqKey="Haussler D">D Haussler</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Scholkopf, B" uniqKey="Scholkopf B">B Schölkopf</name>
</author>
<author><name sortKey="Smola, A" uniqKey="Smola A">A Smola</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Arighi, C" uniqKey="Arighi C">C Arighi</name>
</author>
<author><name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
<author><name sortKey="Krallinger, M" uniqKey="Krallinger M">M Krallinger</name>
</author>
<author><name sortKey="Cohen, K" uniqKey="Cohen K">K Cohen</name>
</author>
<author><name sortKey="Wilbur, W" uniqKey="Wilbur W">W Wilbur</name>
</author>
<author><name sortKey="Valencia, A" uniqKey="Valencia A">A Valencia</name>
</author>
<author><name sortKey="Hirschman, L" uniqKey="Hirschman L">L Hirschman</name>
</author>
<author><name sortKey="Wu, C" uniqKey="Wu C">C Wu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kim, Jd" uniqKey="Kim J">JD Kim</name>
</author>
<author><name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author><name sortKey="Ohta, T" uniqKey="Ohta T">T Ohta</name>
</author>
<author><name sortKey="Bossy, R" uniqKey="Bossy R">R Bossy</name>
</author>
<author><name sortKey="Nguyen, N" uniqKey="Nguyen N">N Nguyen</name>
</author>
<author><name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tikk, D" uniqKey="Tikk D">D Tikk</name>
</author>
<author><name sortKey="Thomas, P" uniqKey="Thomas P">P Thomas</name>
</author>
<author><name sortKey="Palaga, P" uniqKey="Palaga P">P Palaga</name>
</author>
<author><name sortKey="Hakenberg, J" uniqKey="Hakenberg J">J Hakenberg</name>
</author>
<author><name sortKey="Leser, U" uniqKey="Leser U">U Leser</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kim, S" uniqKey="Kim S">S Kim</name>
</author>
<author><name sortKey="Yoon, J" uniqKey="Yoon J">J Yoon</name>
</author>
<author><name sortKey="Yang, J" uniqKey="Yang J">J Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fayruzov, T" uniqKey="Fayruzov T">T Fayruzov</name>
</author>
<author><name sortKey="De Cock, M" uniqKey="De Cock M">M De Cock</name>
</author>
<author><name sortKey="Cornelis, C" uniqKey="Cornelis C">C Cornelis</name>
</author>
<author><name sortKey="Hoste, V" uniqKey="Hoste V">V Hoste</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Giuliano, C" uniqKey="Giuliano C">C Giuliano</name>
</author>
<author><name sortKey="Lavelli, A" uniqKey="Lavelli A">A Lavelli</name>
</author>
<author><name sortKey="Romano, L" uniqKey="Romano L">L Romano</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Vishwanathan, Svn" uniqKey="Vishwanathan S">SVN Vishwanathan</name>
</author>
<author><name sortKey="Smola, Aj" uniqKey="Smola A">AJ Smola</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Collins, M" uniqKey="Collins M">M Collins</name>
</author>
<author><name sortKey="Duffy, N" uniqKey="Duffy N">N Duffy</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Moschitti, A" uniqKey="Moschitti A">A Moschitti</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kuboyama, T" uniqKey="Kuboyama T">T Kuboyama</name>
</author>
<author><name sortKey="Hirata, K" uniqKey="Hirata K">K Hirata</name>
</author>
<author><name sortKey="Kashima, H" uniqKey="Kashima H">H Kashima</name>
</author>
<author><name sortKey="Aoki Kinoshita, Kf" uniqKey="Aoki Kinoshita K">KF Aoki-Kinoshita</name>
</author>
<author><name sortKey="Yasuda, H" uniqKey="Yasuda H">H Yasuda</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Erkan, G" uniqKey="Erkan G">G Erkan</name>
</author>
<author><name sortKey="Ozgur, A" uniqKey="Ozgur A">A Özgür</name>
</author>
<author><name sortKey="Radev, Dr" uniqKey="Radev D">DR Radev</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Airola, A" uniqKey="Airola A">A Airola</name>
</author>
<author><name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author><name sortKey="Bjorne, J" uniqKey="Bjorne J">J Björne</name>
</author>
<author><name sortKey="Pahikkala, T" uniqKey="Pahikkala T">T Pahikkala</name>
</author>
<author><name sortKey="Ginter, F" uniqKey="Ginter F">F Ginter</name>
</author>
<author><name sortKey="Salakoski, T" uniqKey="Salakoski T">T Salakoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Joachims, T" uniqKey="Joachims T">T Joachims</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chang, Cc" uniqKey="Chang C">CC Chang</name>
</author>
<author><name sortKey="Lin, Cj" uniqKey="Lin C">CJ Lin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bunescu, R" uniqKey="Bunescu R">R Bunescu</name>
</author>
<author><name sortKey="Ge, R" uniqKey="Ge R">R Ge</name>
</author>
<author><name sortKey="Kate, Rj" uniqKey="Kate R">RJ Kate</name>
</author>
<author><name sortKey="Marcotte, Em" uniqKey="Marcotte E">EM Marcotte</name>
</author>
<author><name sortKey="Mooney, Rj" uniqKey="Mooney R">RJ Mooney</name>
</author>
<author><name sortKey="Ramani, Ak" uniqKey="Ramani A">AK Ramani</name>
</author>
<author><name sortKey="Wong, Yw" uniqKey="Wong Y">YW Wong</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author><name sortKey="Ginter, F" uniqKey="Ginter F">F Ginter</name>
</author>
<author><name sortKey="Heimonen, J" uniqKey="Heimonen J">J Heimonen</name>
</author>
<author><name sortKey="Bjorne, J" uniqKey="Bjorne J">J Bjorne</name>
</author>
<author><name sortKey="Boberg, J" uniqKey="Boberg J">J Boberg</name>
</author>
<author><name sortKey="Jarvinen, J" uniqKey="Jarvinen J">J Jarvinen</name>
</author>
<author><name sortKey="Salakoski, T" uniqKey="Salakoski T">T Salakoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fundel, K" uniqKey="Fundel K">K Fundel</name>
</author>
<author><name sortKey="Kuffner, R" uniqKey="Kuffner R">R Küffner</name>
</author>
<author><name sortKey="Zimmer, R" uniqKey="Zimmer R">R Zimmer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ding, J" uniqKey="Ding J">J Ding</name>
</author>
<author><name sortKey="Berleant, D" uniqKey="Berleant D">D Berleant</name>
</author>
<author><name sortKey="Nettleton, D" uniqKey="Nettleton D">D Nettleton</name>
</author>
<author><name sortKey="Wurtele, E" uniqKey="Wurtele E">E Wurtele</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nedellec, C" uniqKey="Nedellec C">C Nedellec</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Miwa, M" uniqKey="Miwa M">M Miwa</name>
</author>
<author><name sortKey="S Tre, R" uniqKey="S Tre R">R Sætre</name>
</author>
<author><name sortKey="Miyao, Y" uniqKey="Miyao Y">Y Miyao</name>
</author>
<author><name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kim, S" uniqKey="Kim S">S Kim</name>
</author>
<author><name sortKey="Yoon, J" uniqKey="Yoon J">J Yoon</name>
</author>
<author><name sortKey="Yang, J" uniqKey="Yang J">J Yang</name>
</author>
<author><name sortKey="Park, S" uniqKey="Park S">S Park</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Van Landeghem, S" uniqKey="Van Landeghem S">S Van Landeghem</name>
</author>
<author><name sortKey="De Baets, B" uniqKey="De Baets B">B De Baets</name>
</author>
<author><name sortKey="Van De Peer, Y" uniqKey="Van De Peer Y">Y Van de Peer</name>
</author>
<author><name sortKey="Saeys, Y" uniqKey="Saeys Y">Y Saeys</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Buyko, E" uniqKey="Buyko E">E Buyko</name>
</author>
<author><name sortKey="Faessler, E" uniqKey="Faessler E">E Faessler</name>
</author>
<author><name sortKey="Wermter, J" uniqKey="Wermter J">J Wermter</name>
</author>
<author><name sortKey="Hahn, U" uniqKey="Hahn U">U Hahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cusick, M" uniqKey="Cusick M">M Cusick</name>
</author>
<author><name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
<author><name sortKey="Smolyar, A" uniqKey="Smolyar A">A Smolyar</name>
</author>
<author><name sortKey="Venkatesan, K" uniqKey="Venkatesan K">K Venkatesan</name>
</author>
<author><name sortKey="Carvunis, A" uniqKey="Carvunis A">A Carvunis</name>
</author>
<author><name sortKey="Simonis, N" uniqKey="Simonis N">N Simonis</name>
</author>
<author><name sortKey="Rual, J" uniqKey="Rual J">J Rual</name>
</author>
<author><name sortKey="Borick, H" uniqKey="Borick H">H Borick</name>
</author>
<author><name sortKey="Braun, P" uniqKey="Braun P">P Braun</name>
</author>
<author><name sortKey="Dreze, M" uniqKey="Dreze M">M Dreze</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Witten, Ih" uniqKey="Witten I">IH Witten</name>
</author>
<author><name sortKey="Frank, E" uniqKey="Frank E">E Frank</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Miwa, M" uniqKey="Miwa M">M Miwa</name>
</author>
<author><name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author><name sortKey="Hara, T" uniqKey="Hara T">T Hara</name>
</author>
<author><name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Thomas, P" uniqKey="Thomas P">P Thomas</name>
</author>
<author><name sortKey="Pietschmann, S" uniqKey="Pietschmann S">S Pietschmann</name>
</author>
<author><name sortKey="Solt, I" uniqKey="Solt I">I Solt</name>
</author>
<author><name sortKey="Tikk, D" uniqKey="Tikk D">D Tikk</name>
</author>
<author><name sortKey="Leser, U" uniqKey="Leser U">U Leser</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kim, Jd" uniqKey="Kim J">JD Kim</name>
</author>
<author><name sortKey="Ohta, Jtandtsujii" uniqKey="Ohta J">JTandTsujii Ohta</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Breiman, L" uniqKey="Breiman L">L Breiman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wolpert, D" uniqKey="Wolpert D">D Wolpert</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bui, Qc" uniqKey="Bui Q">QC Bui</name>
</author>
<author><name sortKey="Katrenko, S" uniqKey="Katrenko S">S Katrenko</name>
</author>
<author><name sortKey="Sloot, Pma" uniqKey="Sloot P">PMA Sloot</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Koike, A" uniqKey="Koike A">A Koike</name>
</author>
<author><name sortKey="Kobayashi, Y" uniqKey="Kobayashi Y">Y Kobayashi</name>
</author>
<author><name sortKey="Takagi, T" uniqKey="Takagi T">T Takagi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Miwa, M" uniqKey="Miwa M">M Miwa</name>
</author>
<author><name sortKey="Saetre, R" uniqKey="Saetre R">R Saetre</name>
</author>
<author><name sortKey="Kim, Jd" uniqKey="Kim J">JD Kim</name>
</author>
<author><name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Plake, C" uniqKey="Plake C">C Plake</name>
</author>
<author><name sortKey="Schiemann, T" uniqKey="Schiemann T">T Schiemann</name>
</author>
<author><name sortKey="Pankalla, M" uniqKey="Pankalla M">M Pankalla</name>
</author>
<author><name sortKey="Hakenberg, J" uniqKey="Hakenberg J">J Hakenberg</name>
</author>
<author><name sortKey="Leser, U" uniqKey="Leser U">U Leser</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Banko, M" uniqKey="Banko M">M Banko</name>
</author>
<author><name sortKey="Cafarella, Mj" uniqKey="Cafarella M">MJ Cafarella</name>
</author>
<author><name sortKey="Soderl, S" uniqKey="Soderl S">S Soderl</name>
</author>
<author><name sortKey="Broadhead, M" uniqKey="Broadhead M">M Broadhead</name>
</author>
<author><name sortKey="Etzioni, O" uniqKey="Etzioni O">O Etzioni</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Xu, F" uniqKey="Xu F">F Xu</name>
</author>
<author><name sortKey="Uszkoreit, H" uniqKey="Uszkoreit H">H Uszkoreit</name>
</author>
<author><name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Liu, H" uniqKey="Liu H">H Liu</name>
</author>
<author><name sortKey="Komandur, R" uniqKey="Komandur R">R Komandur</name>
</author>
<author><name sortKey="Verspoor, K" uniqKey="Verspoor K">K Verspoor</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group><journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">23323857</article-id>
<article-id pub-id-type="pmc">3680070</article-id>
<article-id pub-id-type="publisher-id">1471-2105-14-12</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-14-12</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group><article-title>A detailed error analysis of 13 kernel methods for protein–protein interaction extraction</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" corresp="yes" id="A1"><name><surname>Tikk</surname>
<given-names>Domonkos</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>tikk@informatik.hu-berlin.de</email>
</contrib>
<contrib contrib-type="author" id="A2"><name><surname>Solt</surname>
<given-names>Illés</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I3">3</xref>
<email>solt@tmit.bme.hu</email>
</contrib>
<contrib contrib-type="author" id="A3"><name><surname>Thomas</surname>
<given-names>Philippe</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>thomas@informatik.hu-berlin.de</email>
</contrib>
<contrib contrib-type="author" id="A4"><name><surname>Leser</surname>
<given-names>Ulf</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>leser@informatik.hu-berlin.de</email>
</contrib>
</contrib-group>
<aff id="I1"><label>1</label>
Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, 10099 Berlin, Germany</aff>
<aff id="I2"><label>2</label>
Software Engineering Institute, Óbuda University, 1034 Budapest, Hungary</aff>
<aff id="I3"><label>3</label>
Department of Telecommunications and Telematics, Budapest University of Technology and Economics, 1117 Budapest, Hungary</aff>
<pub-date pub-type="collection"><year>2013</year>
</pub-date>
<pub-date pub-type="epub"><day>16</day>
<month>1</month>
<year>2013</year>
</pub-date>
<volume>14</volume>
<fpage>12</fpage>
<lpage>12</lpage>
<history><date date-type="received"><day>4</day>
<month>11</month>
<year>2012</year>
</date>
<date date-type="accepted"><day>19</day>
<month>12</month>
<year>2012</year>
</date>
</history>
<permissions><copyright-statement>Copyright © 2013 Tikk et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2013</copyright-year>
<copyright-holder>Tikk et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0"><license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2105/14/12"></self-uri>
<abstract><sec><title>Background</title>
<p>Kernel-based classification is the current state-of-the-art for extracting pairs of interacting proteins (PPIs) from free text. Various proposals have been put forward, which diverge especially in the specific kernel function, the type of input representation, and the feature sets. These proposals are regularly compared to each other regarding their overall performance on different gold standard corpora, but little is known about their respective performance on the instance level.</p>
</sec>
<sec><title>Results</title>
<p>We report on a detailed analysis of the shared characteristics and the differences between 13 current methods using five PPI corpora. We identified a large number of rather difficult (misclassified by most methods) and easy (correctly classified by most methods) PPIs. We show that kernels using the same input representation perform similarly on these pairs and that building ensembles using dissimilar kernels leads to significant performance gain. However, our analysis also reveals that characteristics shared between difficult pairs are few, which lowers the hope that new methods, if built along the same line as current ones, will deliver breakthroughs in extraction performance.</p>
</sec>
<sec><title>Conclusions</title>
<p>Our experiments show that current methods do not seem to do very well in capturing the shared characteristics of positive PPI pairs, which must also be attributed to the heterogeneity of the (still very few) available corpora. Our analysis suggests that performance improvements shall be sought after rather in novel feature sets than in novel kernel functions.</p>
</sec>
</abstract>
<kwd-group><kwd>Protein–protein interaction</kwd>
<kwd>Relation extraction</kwd>
<kwd>Kernel methods</kwd>
<kwd>Error analysis</kwd>
<kwd>Kernel similarity</kwd>
</kwd-group>
</article-meta>
</front>
<body><sec><title>Background</title>
<p>Automatically extracting protein–protein interactions (PPIs) from free text is one of the major challenges in biomedical text mining [<xref ref-type="bibr" rid="B1">1</xref>
-<xref ref-type="bibr" rid="B6">6</xref>
]. Several methods, which usually are co-occurrence-based, pattern-based, or machine-learning based [<xref ref-type="bibr" rid="B7">7</xref>
], have been developed and compared using a slowly growing body of gold standard corpora [<xref ref-type="bibr" rid="B8">8</xref>
]. However, progress always has been slow (if measured in terms of precision / recall values achieved on the different corpora) and seems to have slowed down even over the last years; furthermore, current results still do not cope with the performance that has been achieved in other areas of relationship extraction [<xref ref-type="bibr" rid="B9">9</xref>
].</p>
<p>In this paper, we want to elucidate the reason of the slow progress by performing a detailed, cross-method study of characteristics shared by PPI instances which many methods fail to classify correctly. We concentrate on a fairly recent class of PPI extraction algorithms, namely <italic>kernel methods</italic>
[<xref ref-type="bibr" rid="B10">10</xref>
,<xref ref-type="bibr" rid="B11">11</xref>
]. The reason for this choice is that these methods were the top-performing in recent competitions [<xref ref-type="bibr" rid="B12">12</xref>
,<xref ref-type="bibr" rid="B13">13</xref>
]. In a nutshell, they work as follows. First, they require a training corpus consisting of labeled sentences, some of which contain PPIs and/or non-interacting proteins, while others contain only one or no protein mentions. All sentences in the training corpus are transformed into structured representations that aims to best capture properties of how the interaction is expressed (or not for negative examples). The representations of protein pairs together with their gold standard PPI-labels are analyzed by a kernel-based learner (mostly an SVM), which builds a predictive model. When analyzing a new sentence for PPIs, its candidate protein pairs are turned into the same representation, then classified by the kernel method. For the sake of brevity, we often use the term <italic>kernel</italic>
 to refer to a combination of SVM learner and a kernel method.</p>
<p>Central to the learning and the classification phases is a so-called kernel function. Simply speaking, a kernel function is a function that takes the representation of two instances (here, protein pairs) and computes their similarity. Kernels functions differ in (1) the underlying sentence representation (bag-of-words, token sequence with shallow linguistic features, syntax tree parse, dependency graphs); (2) the substructures retrieved from the sentence representation to define interactions; and (3) the calculation of the similarity function.</p>
<p>In our recent study [<xref ref-type="bibr" rid="B14">14</xref>
], we analyzed nine kernel-based methods in a comprehensive benchmark and concluded that dependency graph and shallow linguistic feature representations are superior to syntax tree ones. Although we identified three kernels that outperformed the others (APG, SL, kBSPS; see details below), the study also revealed that none of them seems to be a single best approach due to the sensitivity of the methods to various factors—such as parameter settings, evaluation scenario and corpora. This leads to highly heterogeneous evaluation results indicating that methods are strongly prone to over-fit the training corpus.</p>
<p>The focus of this paper is to perform a cross-kernel error analysis at the instance level with the goal to explore possible ways to improve kernel-based PPI extraction. To this end, we determine difficulty classes of protein pairs and investigate the similarity of kernels in terms of their predictions. We show that kernels using the same input representation perform similarly on these pairs and that building ensembles using dissimilar kernels leads to significant performance gain. Additionally, we identify kernels that perform better on certain difficulty classes; paving the road to more complex ensembles. We also show that with a generic feature set and linear classifiers a performance can be achieved that is on par with most kernels. However, our main conclusion is pessimistic: Our results indicate that significant progress in the field of PPI extraction probably can only be achieved if future methods leave the beaten tracks.</p>
</sec>
<sec sec-type="methods"><title>Methods</title>
<p>We recently performed a comprehensive benchmark of nine kernel-based approaches (hereinafter we refer to them briefly as kernels) [<xref ref-type="bibr" rid="B14">14</xref>
]. In the meantime, we obtained another four kernels: three of them were originally proposed by Kim <italic>et al.</italic>
 ([<xref ref-type="bibr" rid="B15">15</xref>
]) and one is its modification described in [<xref ref-type="bibr" rid="B16">16</xref>
]; we refer to them collectively as Kim’s kernels. In this work, we investigate similarities and differences between these 13 kernels.</p>
<sec><title>Kernels</title>
<p>The shallow linguistic (SL) [<xref ref-type="bibr" rid="B17">17</xref>
] kernel does not use deep parsing information. It is solely based on bag-of-word features (words occurring in the sentence fore-between, between and between-after relative to the pair of investigated proteins), surface features (capitalization, punctuation, numerals), and shallow linguistic (POS-tag, lemma) features generated from tokens left and right to the two proteins (in general: entities) of the protein pair.</p>
<p>Subtree (ST; [<xref ref-type="bibr" rid="B18">18</xref>
]), subset tree (SST; [<xref ref-type="bibr" rid="B19">19</xref>
]), partial tree (PT; [<xref ref-type="bibr" rid="B20">20</xref>
]) and spectrum tree (SpT; [<xref ref-type="bibr" rid="B21">21</xref>
]) kernels exploits the syntax tree representation of sentences. They differ in the definition of extracted substructures. ST, SST and PT kernels extract subtrees of the syntax parse tree that contain the analyzed protein pair. SpT uses vertex-walks, that is, sequences of edge-connected syntax tree nodes, as the unit of representation. When comparing two protein pairs, the number of identical substructures are calculated as similarity score.</p>
<p>The next group of kernels applies dependency parse sentence representation. Edit distance and cosine similarity kernels (edit, cosine; [<xref ref-type="bibr" rid="B22">22</xref>
]), as well as the <italic>k</italic>
-band shortest path spectrum (kBSPS; [<xref ref-type="bibr" rid="B14">14</xref>
]) use primarily the shortest path among the entities, but the latter optionally allows for the <italic>k</italic>
-band extension of this path in the representation. The most sophisticated kernel, all-path graph (APG; [<xref ref-type="bibr" rid="B23">23</xref>
]) builds both on the dependency graph and the token sequence representations of the entire sentence, and weighs connections within and outside the shortest path differently.</p>
<p>Kim’s kernels [<xref ref-type="bibr" rid="B15">15</xref>
] also use the shortest path of the dependency parses. The four kernels differ in the information they use from the parses. The <italic>lexical kernel</italic>
 uses only lexical information encoded into the dependency tree, that is, nodes are the lemmas of the sentences connected by dependency relation labeled edges. The <italic>shallow kernel</italic>
 retains only the POS-tag information in the nodes. The similarity score is calculated by both kernels as the number of identical subgraphs of two shortest paths with the specific node labeling. The <italic>combined kernel</italic>
 is the sum of the former two variants. The <italic>syntactic kernel</italic>
, defined in [<xref ref-type="bibr" rid="B16">16</xref>
], applies exclusively the structural information from the dependency tree, that is, only the edge labels are considered at similarity score calculation.</p>
<p>Since Fayruzov’s implementation of Kim’s kernels does not determine automatically the threshold where to separate positive and negative classes, it has to be specified for each model separately. Therefore, in addition to the parameter search described in [<xref ref-type="bibr" rid="B14">14</xref>
] and re-used here, we also performed a coarse-grid <italic>threshold searching strategy</italic>
 in [0,1] with step 0.05. Assuming that the test corpus has similar characteristic as the training one—the usual guess in the absence of further knowledge—we selected the threshold between positive and negative classes such that their ratio approximated the best the ratio measured on the training set. Note that APG [<xref ref-type="bibr" rid="B23">23</xref>
] applies a similar threshold searching strategy but optimizes the threshold against F-score on the training set.</p>
</sec>
<sec><title>Classifiers and parameters</title>
<p>Typically, kernel functions are integrated into SVM implementations. Several freely available and extensible implementations of SVMs exist, among which SVM <sup><italic>l</italic>
<italic>i</italic>
<italic>g</italic>
<italic>h</italic>
<italic>t</italic>
</sup>
[<xref ref-type="bibr" rid="B24">24</xref>
] and LibSVM [<xref ref-type="bibr" rid="B25">25</xref>
] probably are the most renowned ones. Both can be adapted by supplying a user-defined kernel function. In SVM <sup><italic>l</italic>
<italic>i</italic>
<italic>g</italic>
<italic>h</italic>
<italic>t</italic>
</sup>
, kernel functions can be defined as a real function of a pair in the corresponding instance representation. LibSVM, on the other hand, requires the user to pre-compute kernel values, i.e., pass to the SVM learner a matrix containing the pairwise similarity of all instances. Accordingly, most of the kernels we experimented with use the SVM <sup><italic>l</italic>
<italic>i</italic>
<italic>g</italic>
<italic>h</italic>
<italic>t</italic>
</sup>
 implementation, except for the SL and Kim’s kernels that use LibSVM, and APG that uses internally a sparse regularized least squares (RLS) SVM.</p>
</sec>
<sec><title>Corpora</title>
<p>We use the five freely available and widely used PPI-annotated resources also described in [<xref ref-type="bibr" rid="B8">8</xref>
], i.e., AIMed [<xref ref-type="bibr" rid="B26">26</xref>
], BioInfer [<xref ref-type="bibr" rid="B27">27</xref>
], HPRD50 [<xref ref-type="bibr" rid="B28">28</xref>
], IEPA [<xref ref-type="bibr" rid="B29">29</xref>
], and LLL [<xref ref-type="bibr" rid="B30">30</xref>
].</p>
</sec>
<sec><title>Evaluation method</title>
<p>We report on the standard evaluation measures (precision (P), recall (R), F<sub>1</sub>
-score (F)). As we have shown in our previous study [<xref ref-type="bibr" rid="B14">14</xref>
], the AUC measure (area under the receiver operating characteristics curve) that is often used in recent literature to characterize classifiers and independent from the distribution of positive and negative classes, depends very much on the learning algorithm of the classifier, and only partially on the kernel. Therefore, in this study we stick to the above three measures, which actually give a better picture on the expected classification performance on new texts. Results are reported in two different evaluation settings: Primarily, we use the document-level cross-validation scheme (CV), which still seems to be the <italic>de facto</italic>
 standard in PPI extraction. We also use the cross-learning (CL) evaluation strategy for identifying pairs that behave similarly across various evaluation methods.</p>
<p>In the CV setting, we train and test each kernel on the same corpus using document-level 10-fold cross-validation. We employ the document-level splits used by Airola and many others (e.g., [<xref ref-type="bibr" rid="B23">23</xref>
,<xref ref-type="bibr" rid="B31">31</xref>
,<xref ref-type="bibr" rid="B32">32</xref>
]) to allow for direct comparison of results. The ultimate goal of PPI extraction is the identification of PPIs in biomedical texts with unknown characteristics. This task is better reflected in the CL setting, when training and test sets are drawn from different distributions: in such cases, we train on an ensemble of four corpora and test on the fifth one. CL methodology is generally less biased than CV, where the training and the test data sets have very similar corpus characteristics. Note that the difference in the distribution of positive/negative pairs in the five benchmark corpora (ranging from ∼20 to ∼100%) accounts for a substantial part of the diversity of the performance of approaches [<xref ref-type="bibr" rid="B8">8</xref>
]. Differences in the annotation of corpora not limited to distribution but also deviates in their annotation guidelines and the definition of what constitutes a PPI; those differences are dominantly kept in the standardized format [<xref ref-type="bibr" rid="B8">8</xref>
] obtained by applying a transformation approach to yield the greatest common factor in annotations.</p>
</sec>
<sec><title>Experimental setup</title>
<p>For the experimental setup we follow the procedure described in [<xref ref-type="bibr" rid="B14">14</xref>
]. In a nutshell, we applied entity blinding, resolved entity–token mismatch problems and extended the learning format of the sentences with the missing parses. We applied a coarse-grained grid parameter search and selected the best average setting in terms of the averaged F-score measured across the five evaluation corpora as the <italic>default setting</italic>
 for each kernel.</p>
</sec>
</sec>
<sec><title>Results and discussion</title>
<p>The main goal of our analysis was to better characterize kernel methods and understand their short-comings in terms of PPI extraction. We started by characterizing protein pairs: we divided them into three classes based on their difficulty. Difficulty is defined by the observed classification success level of kernels. We also manually scrutiny some of the pairs that were found to be the most difficult ones, suspecting that the reason for the failure of kernels is in fact an incorrect annotation. We re-labeled a set of such suspicious annotations and re-evaluated if kernels were able to benefit from these modifications. We also compare kernels based on their predictions by defining kernel similarity as prediction agreement on the instance level. We investigate how kernels’ input representations correlate with their similarity. Finally, to quantify the claimed advantage of kernels for PPI extraction, we compare kernels to more simple methods. We used linear, non-kernel based classifiers and a surface feature set also found in the kernel methods.</p>
<sec><title>Difficulty of individual protein pairs</title>
<p>In this experiment we determine the difficulty of protein pairs. The fewer kernel based approaches are able to classify a pair correctly, the more difficult the pair is. Different kernels’ predictions vary heavily as we have reported in [<xref ref-type="bibr" rid="B14">14</xref>
]. Here, we show that there exists protein pairs that are inherently difficult to classify (across all 13 kernels), and we investigate whether kernels with generally higher performance classify difficult pairs with greater success.</p>
<p>We define the concept of <italic>success level</italic>
 as the number of kernels being able to classify a given pair correctly. For CV evaluation we performed experiments with all 13 kernels, and therefore have success levels: 0,…,13. For CL evaluation, we omitted the very slow PT kernel (0,…,12). Figures <xref ref-type="fig" rid="F1">1</xref>
 and <xref ref-type="fig" rid="F2">2</xref>
 show the distribution of PPI pairs in terms of success level for CV and CL evaluation aggregated across the 5 corpora, respectively. We also show the same statistics for each corpus separately (Tables <xref ref-type="table" rid="T1">1</xref>
 and <xref ref-type="table" rid="T2">2</xref>
). Figure <xref ref-type="fig" rid="F3">3</xref>
 shows the correlation between success levels of CV and CL.</p>
<fig id="F1" position="float"><label>Figure 1</label>
<caption><p><bold>The distribution of pairs according to classification success level using cross-validation setting.</bold>
 The distribution of pairs (total, positive and negative) in terms of the number of kernels that classify them correctly (success level) aggregated across the 5 corpora in cross-validation setting. Detailed data for each corpus can be find in Table <xref ref-type="table" rid="T1">1</xref>
. All 13 kernels are taken into consideration.</p>
</caption>
<graphic xlink:href="1471-2105-14-12-1"></graphic>
</fig>
<fig id="F2" position="float"><label>Figure 2</label>
<caption><p><bold>The distribution of pairs according to classification success level using cross-learning setting.</bold>
 The distribution of pairs (total, positive and negative) in terms of the number of kernels that classify them correctly (success level) aggregated across the 5 corpora in cross-learning setting. Detailed data for each corpus can be find in Table <xref ref-type="table" rid="T2">2</xref>
. All kernels except for the very slow PT kernel are taken into consideration.</p>
</caption>
<graphic xlink:href="1471-2105-14-12-2"></graphic>
</fig>
<table-wrap position="float" id="T1"><label>Table 1</label>
<caption><p>The distribution of pairs for each corpus according to classification success level using cross-validation setting</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
</colgroup>
<thead valign="top"><tr><th align="left" valign="bottom"> <hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>AIMed</bold>
<hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>BioInfer</bold>
<hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>HPRD50</bold>
<hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>IEPA</bold>
<hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>LLL</bold>
<hr></hr>
</th>
</tr>
<tr><th align="left"> </th>
<th align="right"><bold>Total</bold>
</th>
<th align="right"><bold>T</bold>
</th>
<th align="right"><bold>F</bold>
</th>
<th align="right"><bold>T, %</bold>
</th>
<th align="right"><bold>F, %</bold>
</th>
<th align="right"><bold>Total</bold>
</th>
<th align="right"><bold>T</bold>
</th>
<th align="right"><bold>F</bold>
</th>
<th align="right"><bold>T, %</bold>
</th>
<th align="right"><bold>F, %</bold>
</th>
<th align="right"><bold>Total</bold>
</th>
<th align="right"><bold>T</bold>
</th>
<th align="right"><bold>F</bold>
</th>
<th align="right"><bold>T, %</bold>
</th>
<th align="right"><bold>F, %</bold>
</th>
<th align="right"><bold>Total</bold>
</th>
<th align="right"><bold>T</bold>
</th>
<th align="right"><bold>F</bold>
</th>
<th align="right"><bold>T, %</bold>
</th>
<th align="right"><bold>F, %</bold>
</th>
<th align="right"><bold>Total</bold>
</th>
<th align="right"><bold>T</bold>
</th>
<th align="right"><bold>F</bold>
</th>
<th align="right"><bold>T, %</bold>
</th>
<th align="right"><bold>F, %</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">77<hr></hr>
</td>
<td align="right" valign="bottom">73<hr></hr>
</td>
<td align="right" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">7.3%<hr></hr>
</td>
<td align="right" valign="bottom">0.1%<hr></hr>
</td>
<td align="right" valign="bottom">58<hr></hr>
</td>
<td align="right" valign="bottom">44<hr></hr>
</td>
<td align="right" valign="bottom">14<hr></hr>
</td>
<td align="right" valign="bottom">1.7%<hr></hr>
</td>
<td align="right" valign="bottom">0.2%<hr></hr>
</td>
<td align="right" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">3<hr></hr>
</td>
<td align="right" valign="bottom">0.6%<hr></hr>
</td>
<td align="right" valign="bottom">1.1%<hr></hr>
</td>
<td align="right" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">0.3%<hr></hr>
</td>
<td align="right" valign="bottom">0.2%<hr></hr>
</td>
<td align="right" valign="bottom">5<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">5<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
<td align="right" valign="bottom">3.0%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">95<hr></hr>
</td>
<td align="right" valign="bottom">89<hr></hr>
</td>
<td align="right" valign="bottom">6<hr></hr>
</td>
<td align="right" valign="bottom">8.9%<hr></hr>
</td>
<td align="right" valign="bottom">0.1%<hr></hr>
</td>
<td align="right" valign="bottom">158<hr></hr>
</td>
<td align="right" valign="bottom">107<hr></hr>
</td>
<td align="right" valign="bottom">51<hr></hr>
</td>
<td align="right" valign="bottom">4.2%<hr></hr>
</td>
<td align="right" valign="bottom">0.7%<hr></hr>
</td>
<td align="right" valign="bottom">7<hr></hr>
</td>
<td align="right" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">3<hr></hr>
</td>
<td align="right" valign="bottom">2.5%<hr></hr>
</td>
<td align="right" valign="bottom">1.1%<hr></hr>
</td>
<td align="right" valign="bottom">13<hr></hr>
</td>
<td align="right" valign="bottom">5<hr></hr>
</td>
<td align="right" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom">1.5%<hr></hr>
</td>
<td align="right" valign="bottom">1.7%<hr></hr>
</td>
<td align="right" valign="bottom">7<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">7<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
<td align="right" valign="bottom">4.2%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">105<hr></hr>
</td>
<td align="right" valign="bottom">101<hr></hr>
</td>
<td align="right" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">10.1%<hr></hr>
</td>
<td align="right" valign="bottom">0.1%<hr></hr>
</td>
<td align="right" valign="bottom">206<hr></hr>
</td>
<td align="right" valign="bottom">130<hr></hr>
</td>
<td align="right" valign="bottom">76<hr></hr>
</td>
<td align="right" valign="bottom">5.1%<hr></hr>
</td>
<td align="right" valign="bottom">1.1%<hr></hr>
</td>
<td align="right" valign="bottom">12<hr></hr>
</td>
<td align="right" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">4.9%<hr></hr>
</td>
<td align="right" valign="bottom">1.5%<hr></hr>
</td>
<td align="right" valign="bottom">11<hr></hr>
</td>
<td align="right" valign="bottom">3<hr></hr>
</td>
<td align="right" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom">0.9%<hr></hr>
</td>
<td align="right" valign="bottom">1.7%<hr></hr>
</td>
<td align="right" valign="bottom">27<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">27<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
<td align="right" valign="bottom">16.3%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">3<hr></hr>
</td>
<td align="right" valign="bottom">121<hr></hr>
</td>
<td align="right" valign="bottom">104<hr></hr>
</td>
<td align="right" valign="bottom">17<hr></hr>
</td>
<td align="right" valign="bottom">10.4%<hr></hr>
</td>
<td align="right" valign="bottom">0.4%<hr></hr>
</td>
<td align="right" valign="bottom">306<hr></hr>
</td>
<td align="right" valign="bottom">198<hr></hr>
</td>
<td align="right" valign="bottom">108<hr></hr>
</td>
<td align="right" valign="bottom">7.8%<hr></hr>
</td>
<td align="right" valign="bottom">1.5%<hr></hr>
</td>
<td align="right" valign="bottom">18<hr></hr>
</td>
<td align="right" valign="bottom">7<hr></hr>
</td>
<td align="right" valign="bottom">11<hr></hr>
</td>
<td align="right" valign="bottom">4.3%<hr></hr>
</td>
<td align="right" valign="bottom">4.1%<hr></hr>
</td>
<td align="right" valign="bottom">26<hr></hr>
</td>
<td align="right" valign="bottom">13<hr></hr>
</td>
<td align="right" valign="bottom">13<hr></hr>
</td>
<td align="right" valign="bottom">3.9%<hr></hr>
</td>
<td align="right" valign="bottom">2.7%<hr></hr>
</td>
<td align="right" valign="bottom">10<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">10<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
<td align="right" valign="bottom">6.0%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">139<hr></hr>
</td>
<td align="right" valign="bottom">115<hr></hr>
</td>
<td align="right" valign="bottom">24<hr></hr>
</td>
<td align="right" valign="bottom">11.5%<hr></hr>
</td>
<td align="right" valign="bottom">0.5%<hr></hr>
</td>
<td align="right" valign="bottom">349<hr></hr>
</td>
<td align="right" valign="bottom">203<hr></hr>
</td>
<td align="right" valign="bottom">146<hr></hr>
</td>
<td align="right" valign="bottom">8.0%<hr></hr>
</td>
<td align="right" valign="bottom">2.0%<hr></hr>
</td>
<td align="right" valign="bottom">26<hr></hr>
</td>
<td align="right" valign="bottom">10<hr></hr>
</td>
<td align="right" valign="bottom">16<hr></hr>
</td>
<td align="right" valign="bottom">6.1%<hr></hr>
</td>
<td align="right" valign="bottom">5.9%<hr></hr>
</td>
<td align="right" valign="bottom">30<hr></hr>
</td>
<td align="right" valign="bottom">10<hr></hr>
</td>
<td align="right" valign="bottom">20<hr></hr>
</td>
<td align="right" valign="bottom">3.0%<hr></hr>
</td>
<td align="right" valign="bottom">4.1%<hr></hr>
</td>
<td align="right" valign="bottom">16<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">16<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
<td align="right" valign="bottom">9.6%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">5<hr></hr>
</td>
<td align="right" valign="bottom">140<hr></hr>
</td>
<td align="right" valign="bottom">91<hr></hr>
</td>
<td align="right" valign="bottom">49<hr></hr>
</td>
<td align="right" valign="bottom">9.1%<hr></hr>
</td>
<td align="right" valign="bottom">1.0%<hr></hr>
</td>
<td align="right" valign="bottom">440<hr></hr>
</td>
<td align="right" valign="bottom">225<hr></hr>
</td>
<td align="right" valign="bottom">215<hr></hr>
</td>
<td align="right" valign="bottom">8.9%<hr></hr>
</td>
<td align="right" valign="bottom">3.0%<hr></hr>
</td>
<td align="right" valign="bottom">20<hr></hr>
</td>
<td align="right" valign="bottom">12<hr></hr>
</td>
<td align="right" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom">7.4%<hr></hr>
</td>
<td align="right" valign="bottom">3.0%<hr></hr>
</td>
<td align="right" valign="bottom">43<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">24<hr></hr>
</td>
<td align="right" valign="bottom">5.7%<hr></hr>
</td>
<td align="right" valign="bottom">5.0%<hr></hr>
</td>
<td align="right" valign="bottom">21<hr></hr>
</td>
<td align="right" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">1.2%<hr></hr>
</td>
<td align="right" valign="bottom">11.4%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">6<hr></hr>
</td>
<td align="right" valign="bottom">142<hr></hr>
</td>
<td align="right" valign="bottom">70<hr></hr>
</td>
<td align="right" valign="bottom">72<hr></hr>
</td>
<td align="right" valign="bottom">7.0%<hr></hr>
</td>
<td align="right" valign="bottom">1.5%<hr></hr>
</td>
<td align="right" valign="bottom">481<hr></hr>
</td>
<td align="right" valign="bottom">209<hr></hr>
</td>
<td align="right" valign="bottom">272<hr></hr>
</td>
<td align="right" valign="bottom">8.2%<hr></hr>
</td>
<td align="right" valign="bottom">3.8%<hr></hr>
</td>
<td align="right" valign="bottom">33<hr></hr>
</td>
<td align="right" valign="bottom">9<hr></hr>
</td>
<td align="right" valign="bottom">24<hr></hr>
</td>
<td align="right" valign="bottom">5.5%<hr></hr>
</td>
<td align="right" valign="bottom">8.9%<hr></hr>
</td>
<td align="right" valign="bottom">61<hr></hr>
</td>
<td align="right" valign="bottom">22<hr></hr>
</td>
<td align="right" valign="bottom">39<hr></hr>
</td>
<td align="right" valign="bottom">6.6%<hr></hr>
</td>
<td align="right" valign="bottom">8.1%<hr></hr>
</td>
<td align="right" valign="bottom">26<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">25<hr></hr>
</td>
<td align="right" valign="bottom">0.6%<hr></hr>
</td>
<td align="right" valign="bottom">15.1%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">7<hr></hr>
</td>
<td align="right" valign="bottom">176<hr></hr>
</td>
<td align="right" valign="bottom">65<hr></hr>
</td>
<td align="right" valign="bottom">111<hr></hr>
</td>
<td align="right" valign="bottom">6.5%<hr></hr>
</td>
<td align="right" valign="bottom">2.3%<hr></hr>
</td>
<td align="right" valign="bottom">619<hr></hr>
</td>
<td align="right" valign="bottom">248<hr></hr>
</td>
<td align="right" valign="bottom">371<hr></hr>
</td>
<td align="right" valign="bottom">9.8%<hr></hr>
</td>
<td align="right" valign="bottom">5.2%<hr></hr>
</td>
<td align="right" valign="bottom">35<hr></hr>
</td>
<td align="right" valign="bottom">15<hr></hr>
</td>
<td align="right" valign="bottom">20<hr></hr>
</td>
<td align="right" valign="bottom">9.2%<hr></hr>
</td>
<td align="right" valign="bottom">7.4%<hr></hr>
</td>
<td align="right" valign="bottom">51<hr></hr>
</td>
<td align="right" valign="bottom">20<hr></hr>
</td>
<td align="right" valign="bottom">31<hr></hr>
</td>
<td align="right" valign="bottom">6.0%<hr></hr>
</td>
<td align="right" valign="bottom">6.4%<hr></hr>
</td>
<td align="right" valign="bottom">29<hr></hr>
</td>
<td align="right" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom">21<hr></hr>
</td>
<td align="right" valign="bottom">4.9%<hr></hr>
</td>
<td align="right" valign="bottom">12.7%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom">248<hr></hr>
</td>
<td align="right" valign="bottom">72<hr></hr>
</td>
<td align="right" valign="bottom">176<hr></hr>
</td>
<td align="right" valign="bottom">7.2%<hr></hr>
</td>
<td align="right" valign="bottom">3.6%<hr></hr>
</td>
<td align="right" valign="bottom">785<hr></hr>
</td>
<td align="right" valign="bottom">256<hr></hr>
</td>
<td align="right" valign="bottom">529<hr></hr>
</td>
<td align="right" valign="bottom">10.1%<hr></hr>
</td>
<td align="right" valign="bottom">7.4%<hr></hr>
</td>
<td align="right" valign="bottom">37<hr></hr>
</td>
<td align="right" valign="bottom">9<hr></hr>
</td>
<td align="right" valign="bottom">28<hr></hr>
</td>
<td align="right" valign="bottom">5.5%<hr></hr>
</td>
<td align="right" valign="bottom">10.4%<hr></hr>
</td>
<td align="right" valign="bottom">79<hr></hr>
</td>
<td align="right" valign="bottom">31<hr></hr>
</td>
<td align="right" valign="bottom">48<hr></hr>
</td>
<td align="right" valign="bottom">9.3%<hr></hr>
</td>
<td align="right" valign="bottom">10.0%<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">6<hr></hr>
</td>
<td align="right" valign="bottom">13<hr></hr>
</td>
<td align="right" valign="bottom">3.7%<hr></hr>
</td>
<td align="right" valign="bottom">7.8%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">9<hr></hr>
</td>
<td align="right" valign="bottom">372<hr></hr>
</td>
<td align="right" valign="bottom">69<hr></hr>
</td>
<td align="right" valign="bottom">303<hr></hr>
</td>
<td align="right" valign="bottom">6.9%<hr></hr>
</td>
<td align="right" valign="bottom">6.3%<hr></hr>
</td>
<td align="right" valign="bottom">876<hr></hr>
</td>
<td align="right" valign="bottom">245<hr></hr>
</td>
<td align="right" valign="bottom">631<hr></hr>
</td>
<td align="right" valign="bottom">9.7%<hr></hr>
</td>
<td align="right" valign="bottom">8.8%<hr></hr>
</td>
<td align="right" valign="bottom">46<hr></hr>
</td>
<td align="right" valign="bottom">10<hr></hr>
</td>
<td align="right" valign="bottom">36<hr></hr>
</td>
<td align="right" valign="bottom">6.1%<hr></hr>
</td>
<td align="right" valign="bottom">13.3%<hr></hr>
</td>
<td align="right" valign="bottom">99<hr></hr>
</td>
<td align="right" valign="bottom">32<hr></hr>
</td>
<td align="right" valign="bottom">67<hr></hr>
</td>
<td align="right" valign="bottom">9.6%<hr></hr>
</td>
<td align="right" valign="bottom">13.9%<hr></hr>
</td>
<td align="right" valign="bottom">26<hr></hr>
</td>
<td align="right" valign="bottom">15<hr></hr>
</td>
<td align="right" valign="bottom">11<hr></hr>
</td>
<td align="right" valign="bottom">9.1%<hr></hr>
</td>
<td align="right" valign="bottom">6.6%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">10<hr></hr>
</td>
<td align="right" valign="bottom">461<hr></hr>
</td>
<td align="right" valign="bottom">47<hr></hr>
</td>
<td align="right" valign="bottom">414<hr></hr>
</td>
<td align="right" valign="bottom">4.7%<hr></hr>
</td>
<td align="right" valign="bottom">8.6%<hr></hr>
</td>
<td align="right" valign="bottom">1067<hr></hr>
</td>
<td align="right" valign="bottom">204<hr></hr>
</td>
<td align="right" valign="bottom">863<hr></hr>
</td>
<td align="right" valign="bottom">8.1%<hr></hr>
</td>
<td align="right" valign="bottom">12.1%<hr></hr>
</td>
<td align="right" valign="bottom">61<hr></hr>
</td>
<td align="right" valign="bottom">33<hr></hr>
</td>
<td align="right" valign="bottom">28<hr></hr>
</td>
<td align="right" valign="bottom">20.2%<hr></hr>
</td>
<td align="right" valign="bottom">10.4%<hr></hr>
</td>
<td align="right" valign="bottom">101<hr></hr>
</td>
<td align="right" valign="bottom">38<hr></hr>
</td>
<td align="right" valign="bottom">63<hr></hr>
</td>
<td align="right" valign="bottom">11.3%<hr></hr>
</td>
<td align="right" valign="bottom">13.1%<hr></hr>
</td>
<td align="right" valign="bottom">31<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">12<hr></hr>
</td>
<td align="right" valign="bottom">11.6%<hr></hr>
</td>
<td align="right" valign="bottom">7.2%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">11<hr></hr>
</td>
<td align="right" valign="bottom">619<hr></hr>
</td>
<td align="right" valign="bottom">29<hr></hr>
</td>
<td align="right" valign="bottom">590<hr></hr>
</td>
<td align="right" valign="bottom">2.9%<hr></hr>
</td>
<td align="right" valign="bottom">12.2%<hr></hr>
</td>
<td align="right" valign="bottom">1061<hr></hr>
</td>
<td align="right" valign="bottom">164<hr></hr>
</td>
<td align="right" valign="bottom">897<hr></hr>
</td>
<td align="right" valign="bottom">6.5%<hr></hr>
</td>
<td align="right" valign="bottom">12.6%<hr></hr>
</td>
<td align="right" valign="bottom">49<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">30<hr></hr>
</td>
<td align="right" valign="bottom">11.7%<hr></hr>
</td>
<td align="right" valign="bottom">11.1%<hr></hr>
</td>
<td align="right" valign="bottom">112<hr></hr>
</td>
<td align="right" valign="bottom">46<hr></hr>
</td>
<td align="right" valign="bottom">66<hr></hr>
</td>
<td align="right" valign="bottom">13.7%<hr></hr>
</td>
<td align="right" valign="bottom">13.7%<hr></hr>
</td>
<td align="right" valign="bottom">32<hr></hr>
</td>
<td align="right" valign="bottom">32<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">19.5%<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
</tr>
<tr><td align="right" valign="bottom">12<hr></hr>
</td>
<td align="right" valign="bottom">1002<hr></hr>
</td>
<td align="right" valign="bottom">43<hr></hr>
</td>
<td align="right" valign="bottom">959<hr></hr>
</td>
<td align="right" valign="bottom">4.3%<hr></hr>
</td>
<td align="right" valign="bottom">19.8%<hr></hr>
</td>
<td align="right" valign="bottom">1390<hr></hr>
</td>
<td align="right" valign="bottom">183<hr></hr>
</td>
<td align="right" valign="bottom">1207<hr></hr>
</td>
<td align="right" valign="bottom">7.2%<hr></hr>
</td>
<td align="right" valign="bottom">16.9%<hr></hr>
</td>
<td align="right" valign="bottom">57<hr></hr>
</td>
<td align="right" valign="bottom">13<hr></hr>
</td>
<td align="right" valign="bottom">44<hr></hr>
</td>
<td align="right" valign="bottom">8.0%<hr></hr>
</td>
<td align="right" valign="bottom">16.3%<hr></hr>
</td>
<td align="right" valign="bottom">106<hr></hr>
</td>
<td align="right" valign="bottom">47<hr></hr>
</td>
<td align="right" valign="bottom">59<hr></hr>
</td>
<td align="right" valign="bottom">14.0%<hr></hr>
</td>
<td align="right" valign="bottom">12.2%<hr></hr>
</td>
<td align="right" valign="bottom">45<hr></hr>
</td>
<td align="right" valign="bottom">45<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">27.4%<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
</tr>
<tr><td align="right">13</td>
<td align="right">2137</td>
<td align="right">32</td>
<td align="right">2105</td>
<td align="right">3.2%</td>
<td align="right">43.5%</td>
<td align="right">1870</td>
<td align="right">118</td>
<td align="right">1752</td>
<td align="right">4.7%</td>
<td align="right">24.6%</td>
<td align="right">28</td>
<td align="right">13</td>
<td align="right">15</td>
<td align="right">8.0%</td>
<td align="right">5.6%</td>
<td align="right">83</td>
<td align="right">48</td>
<td align="right">35</td>
<td align="right">14.3%</td>
<td align="right">7.3%</td>
<td align="right">36</td>
<td align="right">36</td>
<td align="right">0</td>
<td align="right">22.0%</td>
<td align="right">0.0%</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>The distribution of pairs (total, positive and negative) in terms of the number of kernels that classify them correctly. Results shown for each corpus separately. Aggregated results are shown in Figure <xref ref-type="fig" rid="F1">1</xref>
. All the 13 kernels are taken into consideration.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T2"><label>Table 2</label>
<caption><p>The distribution of pairs for each corpus according to classification success level using cross-learning setting</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
</colgroup>
<thead valign="top"><tr><th align="left" valign="bottom"> <hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>AIMed</bold>
<hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>BioInfer</bold>
<hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>HPRD50</bold>
<hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>IEPA</bold>
<hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>LLL</bold>
<hr></hr>
</th>
</tr>
<tr><th align="left"> </th>
<th align="right"><bold>Total</bold>
</th>
<th align="right"><bold>T</bold>
</th>
<th align="right"><bold>F</bold>
</th>
<th align="right"><bold>T, %</bold>
</th>
<th align="right"><bold>F, %</bold>
</th>
<th align="right"><bold>Total</bold>
</th>
<th align="right"><bold>T</bold>
</th>
<th align="right"><bold>F</bold>
</th>
<th align="right"><bold>T, %</bold>
</th>
<th align="right"><bold>F, %</bold>
</th>
<th align="right"><bold>Total</bold>
</th>
<th align="right"><bold>T</bold>
</th>
<th align="right"><bold>F</bold>
</th>
<th align="right"><bold>T, %</bold>
</th>
<th align="right"><bold>F, %</bold>
</th>
<th align="right"><bold>Total</bold>
</th>
<th align="right"><bold>T</bold>
</th>
<th align="right"><bold>F</bold>
</th>
<th align="right"><bold>T, %</bold>
</th>
<th align="right"><bold>F, %</bold>
</th>
<th align="right"><bold>Total</bold>
</th>
<th align="right"><bold>T</bold>
</th>
<th align="right"><bold>F</bold>
</th>
<th align="right"><bold>T, %</bold>
</th>
<th align="right"><bold>F, %</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">41<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">41<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
<td align="right" valign="bottom">0.8%<hr></hr>
</td>
<td align="right" valign="bottom">319<hr></hr>
</td>
<td align="right" valign="bottom">319<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">12.6%<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
<td align="right" valign="bottom">0.4%<hr></hr>
</td>
<td align="right" valign="bottom">9<hr></hr>
</td>
<td align="right" valign="bottom">9<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">2.7%<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
<td align="right" valign="bottom">3<hr></hr>
</td>
<td align="right" valign="bottom">3<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">1.8%<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">73<hr></hr>
</td>
<td align="right" valign="bottom">6<hr></hr>
</td>
<td align="right" valign="bottom">67<hr></hr>
</td>
<td align="right" valign="bottom">0.6%<hr></hr>
</td>
<td align="right" valign="bottom">1.4%<hr></hr>
</td>
<td align="right" valign="bottom">362<hr></hr>
</td>
<td align="right" valign="bottom">362<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">14.3%<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
<td align="right" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">1.2%<hr></hr>
</td>
<td align="right" valign="bottom">0.7%<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">17<hr></hr>
</td>
<td align="right" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">5.1%<hr></hr>
</td>
<td align="right" valign="bottom">0.4%<hr></hr>
</td>
<td align="right" valign="bottom">5<hr></hr>
</td>
<td align="right" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">2.4%<hr></hr>
</td>
<td align="right" valign="bottom">0.6%<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">199<hr></hr>
</td>
<td align="right" valign="bottom">26<hr></hr>
</td>
<td align="right" valign="bottom">173<hr></hr>
</td>
<td align="right" valign="bottom">2.6%<hr></hr>
</td>
<td align="right" valign="bottom">3.6%<hr></hr>
</td>
<td align="right" valign="bottom">322<hr></hr>
</td>
<td align="right" valign="bottom">312<hr></hr>
</td>
<td align="right" valign="bottom">10<hr></hr>
</td>
<td align="right" valign="bottom">12.3%<hr></hr>
</td>
<td align="right" valign="bottom">0.1%<hr></hr>
</td>
<td align="right" valign="bottom">7<hr></hr>
</td>
<td align="right" valign="bottom">3<hr></hr>
</td>
<td align="right" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">1.8%<hr></hr>
</td>
<td align="right" valign="bottom">1.5%<hr></hr>
</td>
<td align="right" valign="bottom">33<hr></hr>
</td>
<td align="right" valign="bottom">32<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">9.6%<hr></hr>
</td>
<td align="right" valign="bottom">0.2%<hr></hr>
</td>
<td align="right" valign="bottom">10<hr></hr>
</td>
<td align="right" valign="bottom">9<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">5.5%<hr></hr>
</td>
<td align="right" valign="bottom">0.6%<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">3<hr></hr>
</td>
<td align="right" valign="bottom">315<hr></hr>
</td>
<td align="right" valign="bottom">39<hr></hr>
</td>
<td align="right" valign="bottom">276<hr></hr>
</td>
<td align="right" valign="bottom">3.9%<hr></hr>
</td>
<td align="right" valign="bottom">5.7%<hr></hr>
</td>
<td align="right" valign="bottom">303<hr></hr>
</td>
<td align="right" valign="bottom">280<hr></hr>
</td>
<td align="right" valign="bottom">23<hr></hr>
</td>
<td align="right" valign="bottom">11.0%<hr></hr>
</td>
<td align="right" valign="bottom">0.3%<hr></hr>
</td>
<td align="right" valign="bottom">23<hr></hr>
</td>
<td align="right" valign="bottom">10<hr></hr>
</td>
<td align="right" valign="bottom">13<hr></hr>
</td>
<td align="right" valign="bottom">6.1%<hr></hr>
</td>
<td align="right" valign="bottom">4.8%<hr></hr>
</td>
<td align="right" valign="bottom">38<hr></hr>
</td>
<td align="right" valign="bottom">36<hr></hr>
</td>
<td align="right" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">10.7%<hr></hr>
</td>
<td align="right" valign="bottom">0.4%<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">11.6%<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">489<hr></hr>
</td>
<td align="right" valign="bottom">71<hr></hr>
</td>
<td align="right" valign="bottom">418<hr></hr>
</td>
<td align="right" valign="bottom">7.1%<hr></hr>
</td>
<td align="right" valign="bottom">8.6%<hr></hr>
</td>
<td align="right" valign="bottom">321<hr></hr>
</td>
<td align="right" valign="bottom">260<hr></hr>
</td>
<td align="right" valign="bottom">61<hr></hr>
</td>
<td align="right" valign="bottom">10.3%<hr></hr>
</td>
<td align="right" valign="bottom">0.9%<hr></hr>
</td>
<td align="right" valign="bottom">27<hr></hr>
</td>
<td align="right" valign="bottom">15<hr></hr>
</td>
<td align="right" valign="bottom">12<hr></hr>
</td>
<td align="right" valign="bottom">9.2%<hr></hr>
</td>
<td align="right" valign="bottom">4.4%<hr></hr>
</td>
<td align="right" valign="bottom">48<hr></hr>
</td>
<td align="right" valign="bottom">45<hr></hr>
</td>
<td align="right" valign="bottom">3<hr></hr>
</td>
<td align="right" valign="bottom">13.4%<hr></hr>
</td>
<td align="right" valign="bottom">0.6%<hr></hr>
</td>
<td align="right" valign="bottom">25<hr></hr>
</td>
<td align="right" valign="bottom">25<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">15.2%<hr></hr>
</td>
<td align="right" valign="bottom">0.0%<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">5<hr></hr>
</td>
<td align="right" valign="bottom">606<hr></hr>
</td>
<td align="right" valign="bottom">84<hr></hr>
</td>
<td align="right" valign="bottom">522<hr></hr>
</td>
<td align="right" valign="bottom">8.4%<hr></hr>
</td>
<td align="right" valign="bottom">10.8%<hr></hr>
</td>
<td align="right" valign="bottom">355<hr></hr>
</td>
<td align="right" valign="bottom">239<hr></hr>
</td>
<td align="right" valign="bottom">116<hr></hr>
</td>
<td align="right" valign="bottom">9.4%<hr></hr>
</td>
<td align="right" valign="bottom">1.6%<hr></hr>
</td>
<td align="right" valign="bottom">27<hr></hr>
</td>
<td align="right" valign="bottom">15<hr></hr>
</td>
<td align="right" valign="bottom">12<hr></hr>
</td>
<td align="right" valign="bottom">9.2%<hr></hr>
</td>
<td align="right" valign="bottom">4.4%<hr></hr>
</td>
<td align="right" valign="bottom">44<hr></hr>
</td>
<td align="right" valign="bottom">32<hr></hr>
</td>
<td align="right" valign="bottom">12<hr></hr>
</td>
<td align="right" valign="bottom">9.6%<hr></hr>
</td>
<td align="right" valign="bottom">2.5%<hr></hr>
</td>
<td align="right" valign="bottom">25<hr></hr>
</td>
<td align="right" valign="bottom">20<hr></hr>
</td>
<td align="right" valign="bottom">5<hr></hr>
</td>
<td align="right" valign="bottom">12.2%<hr></hr>
</td>
<td align="right" valign="bottom">3.0%<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">6<hr></hr>
</td>
<td align="right" valign="bottom">547<hr></hr>
</td>
<td align="right" valign="bottom">94<hr></hr>
</td>
<td align="right" valign="bottom">453<hr></hr>
</td>
<td align="right" valign="bottom">9.4%<hr></hr>
</td>
<td align="right" valign="bottom">9.4%<hr></hr>
</td>
<td align="right" valign="bottom">400<hr></hr>
</td>
<td align="right" valign="bottom">208<hr></hr>
</td>
<td align="right" valign="bottom">192<hr></hr>
</td>
<td align="right" valign="bottom">8.2%<hr></hr>
</td>
<td align="right" valign="bottom">2.7%<hr></hr>
</td>
<td align="right" valign="bottom">41<hr></hr>
</td>
<td align="right" valign="bottom">22<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">13.5%<hr></hr>
</td>
<td align="right" valign="bottom">7.0%<hr></hr>
</td>
<td align="right" valign="bottom">51<hr></hr>
</td>
<td align="right" valign="bottom">34<hr></hr>
</td>
<td align="right" valign="bottom">17<hr></hr>
</td>
<td align="right" valign="bottom">10.1%<hr></hr>
</td>
<td align="right" valign="bottom">3.5%<hr></hr>
</td>
<td align="right" valign="bottom">26<hr></hr>
</td>
<td align="right" valign="bottom">18<hr></hr>
</td>
<td align="right" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom">11.0%<hr></hr>
</td>
<td align="right" valign="bottom">4.8%<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">7<hr></hr>
</td>
<td align="right" valign="bottom">725<hr></hr>
</td>
<td align="right" valign="bottom">136<hr></hr>
</td>
<td align="right" valign="bottom">589<hr></hr>
</td>
<td align="right" valign="bottom">13.6%<hr></hr>
</td>
<td align="right" valign="bottom">12.2%<hr></hr>
</td>
<td align="right" valign="bottom">432<hr></hr>
</td>
<td align="right" valign="bottom">190<hr></hr>
</td>
<td align="right" valign="bottom">242<hr></hr>
</td>
<td align="right" valign="bottom">7.5%<hr></hr>
</td>
<td align="right" valign="bottom">3.4%<hr></hr>
</td>
<td align="right" valign="bottom">43<hr></hr>
</td>
<td align="right" valign="bottom">18<hr></hr>
</td>
<td align="right" valign="bottom">25<hr></hr>
</td>
<td align="right" valign="bottom">11.0%<hr></hr>
</td>
<td align="right" valign="bottom">9.3%<hr></hr>
</td>
<td align="right" valign="bottom">63<hr></hr>
</td>
<td align="right" valign="bottom">32<hr></hr>
</td>
<td align="right" valign="bottom">31<hr></hr>
</td>
<td align="right" valign="bottom">9.6%<hr></hr>
</td>
<td align="right" valign="bottom">6.4%<hr></hr>
</td>
<td align="right" valign="bottom">20<hr></hr>
</td>
<td align="right" valign="bottom">7<hr></hr>
</td>
<td align="right" valign="bottom">13<hr></hr>
</td>
<td align="right" valign="bottom">4.3%<hr></hr>
</td>
<td align="right" valign="bottom">7.8%<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom">721<hr></hr>
</td>
<td align="right" valign="bottom">132<hr></hr>
</td>
<td align="right" valign="bottom">589<hr></hr>
</td>
<td align="right" valign="bottom">13.2%<hr></hr>
</td>
<td align="right" valign="bottom">12.2%<hr></hr>
</td>
<td align="right" valign="bottom">586<hr></hr>
</td>
<td align="right" valign="bottom">146<hr></hr>
</td>
<td align="right" valign="bottom">440<hr></hr>
</td>
<td align="right" valign="bottom">5.8%<hr></hr>
</td>
<td align="right" valign="bottom">6.2%<hr></hr>
</td>
<td align="right" valign="bottom">52<hr></hr>
</td>
<td align="right" valign="bottom">17<hr></hr>
</td>
<td align="right" valign="bottom">35<hr></hr>
</td>
<td align="right" valign="bottom">10.4%<hr></hr>
</td>
<td align="right" valign="bottom">13.0%<hr></hr>
</td>
<td align="right" valign="bottom">69<hr></hr>
</td>
<td align="right" valign="bottom">35<hr></hr>
</td>
<td align="right" valign="bottom">34<hr></hr>
</td>
<td align="right" valign="bottom">10.4%<hr></hr>
</td>
<td align="right" valign="bottom">7.1%<hr></hr>
</td>
<td align="right" valign="bottom">34<hr></hr>
</td>
<td align="right" valign="bottom">18<hr></hr>
</td>
<td align="right" valign="bottom">16<hr></hr>
</td>
<td align="right" valign="bottom">11.0%<hr></hr>
</td>
<td align="right" valign="bottom">9.6%<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">9<hr></hr>
</td>
<td align="right" valign="bottom">767<hr></hr>
</td>
<td align="right" valign="bottom">110<hr></hr>
</td>
<td align="right" valign="bottom">657<hr></hr>
</td>
<td align="right" valign="bottom">11.0%<hr></hr>
</td>
<td align="right" valign="bottom">13.6%<hr></hr>
</td>
<td align="right" valign="bottom">737<hr></hr>
</td>
<td align="right" valign="bottom">95<hr></hr>
</td>
<td align="right" valign="bottom">642<hr></hr>
</td>
<td align="right" valign="bottom">3.7%<hr></hr>
</td>
<td align="right" valign="bottom">9.0%<hr></hr>
</td>
<td align="right" valign="bottom">61<hr></hr>
</td>
<td align="right" valign="bottom">18<hr></hr>
</td>
<td align="right" valign="bottom">43<hr></hr>
</td>
<td align="right" valign="bottom">11.0%<hr></hr>
</td>
<td align="right" valign="bottom">15.9%<hr></hr>
</td>
<td align="right" valign="bottom">107<hr></hr>
</td>
<td align="right" valign="bottom">36<hr></hr>
</td>
<td align="right" valign="bottom">71<hr></hr>
</td>
<td align="right" valign="bottom">10.7%<hr></hr>
</td>
<td align="right" valign="bottom">14.7%<hr></hr>
</td>
<td align="right" valign="bottom">34<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">15<hr></hr>
</td>
<td align="right" valign="bottom">11.6%<hr></hr>
</td>
<td align="right" valign="bottom">9.0%<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">10<hr></hr>
</td>
<td align="right" valign="bottom">574<hr></hr>
</td>
<td align="right" valign="bottom">118<hr></hr>
</td>
<td align="right" valign="bottom">456<hr></hr>
</td>
<td align="right" valign="bottom">11.8%<hr></hr>
</td>
<td align="right" valign="bottom">9.4%<hr></hr>
</td>
<td align="right" valign="bottom">1060<hr></hr>
</td>
<td align="right" valign="bottom">79<hr></hr>
</td>
<td align="right" valign="bottom">981<hr></hr>
</td>
<td align="right" valign="bottom">3.1%<hr></hr>
</td>
<td align="right" valign="bottom">13.8%<hr></hr>
</td>
<td align="right" valign="bottom">50<hr></hr>
</td>
<td align="right" valign="bottom">14<hr></hr>
</td>
<td align="right" valign="bottom">36<hr></hr>
</td>
<td align="right" valign="bottom">8.6%<hr></hr>
</td>
<td align="right" valign="bottom">13.3%<hr></hr>
</td>
<td align="right" valign="bottom">110<hr></hr>
</td>
<td align="right" valign="bottom">13<hr></hr>
</td>
<td align="right" valign="bottom">97<hr></hr>
</td>
<td align="right" valign="bottom">3.9%<hr></hr>
</td>
<td align="right" valign="bottom">20.1%<hr></hr>
</td>
<td align="right" valign="bottom">56<hr></hr>
</td>
<td align="right" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom">48<hr></hr>
</td>
<td align="right" valign="bottom">4.9%<hr></hr>
</td>
<td align="right" valign="bottom">28.9%<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">11<hr></hr>
</td>
<td align="right" valign="bottom">414<hr></hr>
</td>
<td align="right" valign="bottom">69<hr></hr>
</td>
<td align="right" valign="bottom">345<hr></hr>
</td>
<td align="right" valign="bottom">6.9%<hr></hr>
</td>
<td align="right" valign="bottom">7.1%<hr></hr>
</td>
<td align="right" valign="bottom">1906<hr></hr>
</td>
<td align="right" valign="bottom">29<hr></hr>
</td>
<td align="right" valign="bottom">1877<hr></hr>
</td>
<td align="right" valign="bottom">1.1%<hr></hr>
</td>
<td align="right" valign="bottom">26.3%<hr></hr>
</td>
<td align="right" valign="bottom">52<hr></hr>
</td>
<td align="right" valign="bottom">16<hr></hr>
</td>
<td align="right" valign="bottom">36<hr></hr>
</td>
<td align="right" valign="bottom">9.8%<hr></hr>
</td>
<td align="right" valign="bottom">13.3%<hr></hr>
</td>
<td align="right" valign="bottom">131<hr></hr>
</td>
<td align="right" valign="bottom">6<hr></hr>
</td>
<td align="right" valign="bottom">125<hr></hr>
</td>
<td align="right" valign="bottom">1.8%<hr></hr>
</td>
<td align="right" valign="bottom">25.9%<hr></hr>
</td>
<td align="right" valign="bottom">50<hr></hr>
</td>
<td align="right" valign="bottom">12<hr></hr>
</td>
<td align="right" valign="bottom">38<hr></hr>
</td>
<td align="right" valign="bottom">7.3%<hr></hr>
</td>
<td align="right" valign="bottom">22.9%<hr></hr>
</td>
</tr>
<tr><td align="left">12</td>
<td align="right">363</td>
<td align="right">115</td>
<td align="right">248</td>
<td align="right">11.5%</td>
<td align="right">5.1%</td>
<td align="right">2563</td>
<td align="right">15</td>
<td align="right">2548</td>
<td align="right">0.6%</td>
<td align="right">35.7%</td>
<td align="right">45</td>
<td align="right">13</td>
<td align="right">32</td>
<td align="right">8.0%</td>
<td align="right">11.9%</td>
<td align="right">95</td>
<td align="right">8</td>
<td align="right">87</td>
<td align="right">2.4%</td>
<td align="right">18.0%</td>
<td align="right">23</td>
<td align="right">2</td>
<td align="right">21</td>
<td align="right">1.2%</td>
<td align="right">12.7%</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>The distribution of pairs (total, positive and negative) in terms of the number of kernels that classify them correctly. Results shown for each corpus separately. Aggregated results are shown in Figure <xref ref-type="fig" rid="F2">2</xref>
. All but the PT kernel are considered. (PT is extremely slow and provide below average results).</p>
</table-wrap-foot>
</table-wrap>
<fig id="F3" position="float"><label>Figure 3</label>
<caption><p><bold>Heatmap of success level correlation in CV and CL evaluations.</bold>
 Correlation ranges from 2 (cyan) through 63 (white) to 1266 (magenta) pairs. Hues are on logarithmic scale.</p>
</caption>
<graphic xlink:href="1471-2105-14-12-3"></graphic>
</fig>
<p>The 10–15 percentage point difference in F-score between CV and CL settings reported in [<xref ref-type="bibr" rid="B14">14</xref>
] can be most evidently seen in the slightly better performance of classifiers on difficult pairs in the CV setting. For example, pairs not classified correctly by any kernels in the CL setting (CL00) are most likely correctly classified by some CV classifiers (CV01–CV05), as shown in Figure <xref ref-type="fig" rid="F3">3</xref>
. Not surprisingly, the pairs correctly classified by most classifiers in either of the CV and CL settings correlate well (see upper right corner in Figure <xref ref-type="fig" rid="F3">3</xref>
). The pairs that are difficult in both evaluation settings (D) are reasonable target for further inspection, as improving kernels to better perform on the them would benefit both scenarios; we attempt to characterize such pairs in subsequent Section.</p>
<p>In order to better identify pairs that are difficult or easy to classify correctly, for each corpus, we took the most difficult and the easiest ∼10% of pairs. For this we cut off the set of pairs at such a success level that the resulting subset of pairs is the closest possible to 10%. Ultimately, we define more universal difficulty classes as the intersection of the respective difficulty classes in CV and CL settings, e.g. D=D<sub>CV</sub>
∩D<sub>CL</sub>
. When ground truth can be considered to be known, we may further define the intuitive subclasses negative difficult (ND), positive difficult (PD), negative easy (NE) and positive easy (PE), respectively.</p>
<p>We investigated whether and in what extent these classes of pairs overlap depending on the evaluation setting (see Table <xref ref-type="table" rid="T3">3</xref>
). We used the <italic>χ</italic>
<sup>2</sup>
-test to check if there was a significantly higher overlap between the two sets compared to as if drawn at random. A p-value lower than 0.001 is considered significant. There are only few cases where correlation is not significant; we discuss these cases separately (1) where the ground truth is known (e.g., PD for HPRD50), and (2) where the ground truth is unknown (e.g., D for LLL).</p>
<table-wrap position="float" id="T3"><label>Table 3</label>
<caption><p>The overlap of the pairs that are the most difficult and the easiest to classify correctly by the collection of kernels using cross-validation (CV) and cross-learning (CL) settings</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
</colgroup>
<thead valign="top"><tr><th colspan="3" align="center" valign="bottom"><bold>Difficulty class</bold>
<hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>Corpus</bold>
<hr></hr>
</th>
<th align="right" valign="bottom"> <hr></hr>
</th>
<th align="center" valign="bottom"><bold>Total</bold>
<hr></hr>
</th>
</tr>
<tr><th align="left"><bold> Difficulty</bold>
</th>
<th align="center"><bold>GT</bold>
</th>
<th align="left"><bold> Class/setting</bold>
</th>
<th align="right"><bold>AIMed</bold>
</th>
<th align="right"><bold>BioInfer</bold>
</th>
<th align="right"><bold>HPRD50</bold>
</th>
<th align="right"><bold>IEPA</bold>
</th>
<th align="right"><bold>LLL</bold>
</th>
<th align="right"><bold>#</bold>
</th>
<th align="right"><bold>%</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">difficult<hr></hr>
</td>
<td align="center" valign="bottom">unknown<hr></hr>
</td>
<td align="left" valign="bottom">D <sub>CV</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">537<hr></hr>
</td>
<td align="right" valign="bottom">1 077<hr></hr>
</td>
<td align="right" valign="bottom">41<hr></hr>
</td>
<td align="right" valign="bottom">82<hr></hr>
</td>
<td align="right" valign="bottom">39<hr></hr>
</td>
<td align="right" valign="bottom">1776<hr></hr>
</td>
<td align="right" valign="bottom">10.4<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">D <sub>CL</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">628<hr></hr>
</td>
<td align="right" valign="bottom">1 003<hr></hr>
</td>
<td align="right" valign="bottom">35<hr></hr>
</td>
<td align="right" valign="bottom">99<hr></hr>
</td>
<td align="right" valign="bottom">37<hr></hr>
</td>
<td align="right" valign="bottom">1802<hr></hr>
</td>
<td align="right" valign="bottom">10.6<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">D =<italic>D</italic>
<sub>CV</sub>
∩<italic>D</italic>
<sub>CL</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">105<hr></hr>
</td>
<td align="right" valign="bottom">530<hr></hr>
</td>
<td align="right" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom">28<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">671<hr></hr>
</td>
<td align="right" valign="bottom">3.9<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="center" valign="bottom">p-value<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−10</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−281</sup>
<hr></hr>
</td>
<td align="right" valign="bottom"><bold>1</bold>
0<sup>−2</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−8</sup>
<hr></hr>
</td>
<td align="right" valign="bottom"><bold>1.0</bold>
<hr></hr>
</td>
<td align="right" valign="bottom"> <hr></hr>
</td>
<td align="right" valign="bottom"> <hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="center" valign="bottom">positive<hr></hr>
</td>
<td align="left" valign="bottom">PD <sub>CV</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">162<hr></hr>
</td>
<td align="right" valign="bottom">281<hr></hr>
</td>
<td align="right" valign="bottom">20<hr></hr>
</td>
<td align="right" valign="bottom">32<hr></hr>
</td>
<td align="right" valign="bottom">17<hr></hr>
</td>
<td align="right" valign="bottom">512<hr></hr>
</td>
<td align="right" valign="bottom">12.2<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">PD <sub>CL</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">142<hr></hr>
</td>
<td align="right" valign="bottom">319<hr></hr>
</td>
<td align="right" valign="bottom">15<hr></hr>
</td>
<td align="right" valign="bottom">26<hr></hr>
</td>
<td align="right" valign="bottom">16<hr></hr>
</td>
<td align="right" valign="bottom">518<hr></hr>
</td>
<td align="right" valign="bottom">12.3<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">PD =<italic>PD</italic>
<sub>CV</sub>
∩<italic>PD</italic>
<sub>CL</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">61<hr></hr>
</td>
<td align="right" valign="bottom">111<hr></hr>
</td>
<td align="right" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">9<hr></hr>
</td>
<td align="right" valign="bottom">7<hr></hr>
</td>
<td align="right" valign="bottom"><bold>190</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">4.5<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="center" valign="bottom">p-value<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−60</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−95</sup>
<hr></hr>
</td>
<td align="right" valign="bottom"><bold>1</bold>
0<sup>−1</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−7</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−6</sup>
<hr></hr>
</td>
<td align="right" valign="bottom"> <hr></hr>
</td>
<td align="right" valign="bottom"> <hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="center" valign="bottom">negative<hr></hr>
</td>
<td align="left" valign="bottom">ND <sub>CV</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">463<hr></hr>
</td>
<td align="right" valign="bottom">610<hr></hr>
</td>
<td align="right" valign="bottom">37<hr></hr>
</td>
<td align="right" valign="bottom">50<hr></hr>
</td>
<td align="right" valign="bottom">39<hr></hr>
</td>
<td align="right" valign="bottom">1199<hr></hr>
</td>
<td align="right" valign="bottom">9.3<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">ND <sub>CV</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">557<hr></hr>
</td>
<td align="right" valign="bottom">644<hr></hr>
</td>
<td align="right" valign="bottom">32<hr></hr>
</td>
<td align="right" valign="bottom">37<hr></hr>
</td>
<td align="right" valign="bottom">28<hr></hr>
</td>
<td align="right" valign="bottom">1298<hr></hr>
</td>
<td align="right" valign="bottom">10.1<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">ND =<italic>ND</italic>
<sub>CV</sub>
∩<italic>ND</italic>
<sub>CL</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">184<hr></hr>
</td>
<td align="right" valign="bottom">295<hr></hr>
</td>
<td align="right" valign="bottom">12<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">11<hr></hr>
</td>
<td align="right" valign="bottom"><bold>521</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">4.0<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="center" valign="bottom">p-value<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−76</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−204</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−6</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−15</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−4</sup>
<hr></hr>
</td>
<td align="right" valign="bottom"> <hr></hr>
</td>
<td align="right" valign="bottom"> <hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">easy<hr></hr>
</td>
<td align="center" valign="bottom">unknown<hr></hr>
</td>
<td align="left" valign="bottom">E <sub>CV</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">2137<hr></hr>
</td>
<td align="right" valign="bottom">1870<hr></hr>
</td>
<td align="right" valign="bottom">85<hr></hr>
</td>
<td align="right" valign="bottom">83<hr></hr>
</td>
<td align="right" valign="bottom">36<hr></hr>
</td>
<td align="right" valign="bottom">4211<hr></hr>
</td>
<td align="right" valign="bottom">24.7<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">E <sub>CL</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">777<hr></hr>
</td>
<td align="right" valign="bottom">2563<hr></hr>
</td>
<td align="right" valign="bottom">45<hr></hr>
</td>
<td align="right" valign="bottom">95<hr></hr>
</td>
<td align="right" valign="bottom">73<hr></hr>
</td>
<td align="right" valign="bottom">3558<hr></hr>
</td>
<td align="right" valign="bottom">20.8<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">E =<italic>E</italic>
<sub>CV</sub>
∩<italic>E</italic>
<sub>CL</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">464<hr></hr>
</td>
<td align="right" valign="bottom">1017<hr></hr>
</td>
<td align="right" valign="bottom">23<hr></hr>
</td>
<td align="right" valign="bottom">20<hr></hr>
</td>
<td align="right" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">1528<hr></hr>
</td>
<td align="right" valign="bottom">8.9<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="center" valign="bottom">p-value<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−45</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−184</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−7</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−3</sup>
<hr></hr>
</td>
<td align="right" valign="bottom"><bold>1.0</bold>
<hr></hr>
</td>
<td align="right" valign="bottom"> <hr></hr>
</td>
<td align="right" valign="bottom"> <hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="center" valign="bottom">positive<hr></hr>
</td>
<td align="left" valign="bottom">PE <sub>CV</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">104<hr></hr>
</td>
<td align="right" valign="bottom">301<hr></hr>
</td>
<td align="right" valign="bottom">26<hr></hr>
</td>
<td align="right" valign="bottom">48<hr></hr>
</td>
<td align="right" valign="bottom">36<hr></hr>
</td>
<td align="right" valign="bottom">515<hr></hr>
</td>
<td align="right" valign="bottom">12.3<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">PE <sub>CL</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">115<hr></hr>
</td>
<td align="right" valign="bottom">364<hr></hr>
</td>
<td align="right" valign="bottom">29<hr></hr>
</td>
<td align="right" valign="bottom">27<hr></hr>
</td>
<td align="right" valign="bottom">22<hr></hr>
</td>
<td align="right" valign="bottom">557<hr></hr>
</td>
<td align="right" valign="bottom">13.3<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">PE =<italic>PE</italic>
<sub>CV</sub>
∩<italic>PE</italic>
<sub>CL</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">49<hr></hr>
</td>
<td align="right" valign="bottom">147<hr></hr>
</td>
<td align="right" valign="bottom">6<hr></hr>
</td>
<td align="right" valign="bottom">10<hr></hr>
</td>
<td align="right" valign="bottom">7<hr></hr>
</td>
<td align="right" valign="bottom"><bold>219</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">5.2<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="center" valign="bottom">p-value<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−59</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−136</sup>
<hr></hr>
</td>
<td align="right" valign="bottom"><bold>1</bold>
0<sup>−3</sup>
<hr></hr>
</td>
<td align="right" valign="bottom">10<sup>−7</sup>
<hr></hr>
</td>
<td align="right" valign="bottom"><bold>1</bold>
0<sup>−2</sup>
<hr></hr>
</td>
<td align="right" valign="bottom"> <hr></hr>
</td>
<td align="right" valign="bottom"> <hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="center" valign="bottom">negative<hr></hr>
</td>
<td align="left" valign="bottom">NE <sub>CV</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">2105<hr></hr>
</td>
<td align="right" valign="bottom">1752<hr></hr>
</td>
<td align="right" valign="bottom">59<hr></hr>
</td>
<td align="right" valign="bottom">94<hr></hr>
</td>
<td align="right" valign="bottom">23<hr></hr>
</td>
<td align="right" valign="bottom">4033<hr></hr>
</td>
<td align="right" valign="bottom">31.3<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">NE <sub>CL</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">593<hr></hr>
</td>
<td align="right" valign="bottom">2548<hr></hr>
</td>
<td align="right" valign="bottom">32<hr></hr>
</td>
<td align="right" valign="bottom">87<hr></hr>
</td>
<td align="right" valign="bottom">21<hr></hr>
</td>
<td align="right" valign="bottom">3281<hr></hr>
</td>
<td align="right" valign="bottom">25.5<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">NE =<italic>NE</italic>
<sub>CV</sub>
∩<italic>NE</italic>
<sub>CL</sub>
<hr></hr>
</td>
<td align="right" valign="bottom">440<hr></hr>
</td>
<td align="right" valign="bottom">1014<hr></hr>
</td>
<td align="right" valign="bottom">21<hr></hr>
</td>
<td align="right" valign="bottom">27<hr></hr>
</td>
<td align="right" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom"><bold>1510</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">11.7<hr></hr>
</td>
</tr>
<tr><td align="left"> </td>
<td align="left"> </td>
<td align="center">p-value</td>
<td align="right">10<sup>−88</sup>
</td>
<td align="right">10<sup>−215</sup>
</td>
<td align="right">10<sup>−12</sup>
</td>
<td align="right">10<sup>−7</sup>
</td>
<td align="right">10<sup>−5</sup>
</td>
<td align="right"> </td>
<td align="right"> </td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>We also indicated the size of each set, because they vary depending on the size of success level classes. Abbreviations D, E, PD, ND, PE, and NE refer to the set of difficult (unknown class label), easy (unknown class label), positive difficult, negative difficult, positive easy and negative easy pairs, respectively; GT means ground truth. We highlighted with bold the number pairs in the intersection of CV and CL settings. We show the p-value of Fisher’s independence <italic>χ</italic>
<sup>2</sup>
-test rounded to the closest factor of 10. Bold typesetting indicates that the size of the overlap is too low.</p>
</table-wrap-foot>
</table-wrap>
<p>For case (1), the very few exceptions (PD and PE at HPRD50, and PE at LLL) account only for a mere 1% of PD and 6% of PE pairs. We can also see that the larger a corpus, the better CV and CL evaluations “agree” on the difficulty class of pairs: the strongest correlations can be observed at BioInfer and AIMed.</p>
<p>Considering case (2), for LLL, the intersection of difficult pairs in CV and CL happens to be empty. It was shown in [<xref ref-type="bibr" rid="B8">8</xref>
,<xref ref-type="bibr" rid="B14">14</xref>
] that kernels tend to preserve the distribution of positive/negative classes from training to test. LLL has a particularly high ratio of positive examples (50% compared to the average of 25% in the other four corpora). Therefore, kernels predict positive pairs easier for LLL at the CV evaluation, in contrast to CL: in CV evaluation, negative pairs are difficult and in CL evaluation positive ones are difficult. These factors and the relatively small size of the LLL corpus (2% of all five corpora) should explain the empty intersection.</p>
<p>We conclude that our method for identifying the difficult and easy pairs of each class finds meaningful subsets of pairs. We identified 521 ND (negative difficult), 190 PD (positive difficult), 1510 NE (negative easy) and 219 PE (positive easy) pairs.</p>
</sec>
<sec><title>How kernels perform on difficult and easy pairs</title>
<p>In Table <xref ref-type="table" rid="T4">4</xref>
 we show how the different kernels perform on the 521 ND pairs. We publish the same results for the PD, NE, and PE pairs, as well as for all four experiments for CL setting (Tables <xref ref-type="table" rid="T5">5</xref>
, <xref ref-type="table" rid="T6">6</xref>
, <xref ref-type="table" rid="T7">7</xref>
, <xref ref-type="table" rid="T8">8</xref>
, <xref ref-type="table" rid="T9">9</xref>
, <xref ref-type="table" rid="T10">10</xref>
 and <xref ref-type="table" rid="T11">11</xref>
).</p>
<table-wrap position="float" id="T4"><label>Table 4</label>
<caption><p>Classification results on the 521 ND pairs with CV evaluation</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top"><tr><th align="left"><bold>Kernel</bold>
</th>
<th align="center"><bold><italic>r</italic>
</bold>
</th>
<th align="center"><bold>TN</bold>
</th>
<th align="center"><bold><italic>e</italic>
</bold>
</th>
<th align="center"><bold>TN/</bold>
<bold><italic>e</italic>
</bold>
</th>
<th align="center"><bold>TN/ND</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">edit<hr></hr>
</td>
<td align="center" valign="bottom">18.1<hr></hr>
</td>
<td align="center" valign="bottom">305<hr></hr>
</td>
<td align="center" valign="bottom">427<hr></hr>
</td>
<td align="center" valign="bottom">0.71<hr></hr>
</td>
<td align="center" valign="bottom">0.59<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">lexical<hr></hr>
</td>
<td align="center" valign="bottom">25.0<hr></hr>
</td>
<td align="center" valign="bottom">203<hr></hr>
</td>
<td align="center" valign="bottom">391<hr></hr>
</td>
<td align="center" valign="bottom">0.52<hr></hr>
</td>
<td align="center" valign="bottom">0.39<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SST<hr></hr>
</td>
<td align="center" valign="bottom">26.6<hr></hr>
</td>
<td align="center" valign="bottom">186<hr></hr>
</td>
<td align="center" valign="bottom">382<hr></hr>
</td>
<td align="center" valign="bottom">0.49<hr></hr>
</td>
<td align="center" valign="bottom">0.36<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG<hr></hr>
</td>
<td align="center" valign="bottom">25.3<hr></hr>
</td>
<td align="center" valign="bottom">185<hr></hr>
</td>
<td align="center" valign="bottom">389<hr></hr>
</td>
<td align="center" valign="bottom">0.48<hr></hr>
</td>
<td align="center" valign="bottom">0.36<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">PT<hr></hr>
</td>
<td align="center" valign="bottom">27.9<hr></hr>
</td>
<td align="center" valign="bottom">185<hr></hr>
</td>
<td align="center" valign="bottom">376<hr></hr>
</td>
<td align="center" valign="bottom">0.49<hr></hr>
</td>
<td align="center" valign="bottom">0.36<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">syntactic<hr></hr>
</td>
<td align="center" valign="bottom">24.4<hr></hr>
</td>
<td align="center" valign="bottom">180<hr></hr>
</td>
<td align="center" valign="bottom">394<hr></hr>
</td>
<td align="center" valign="bottom">0.46<hr></hr>
</td>
<td align="center" valign="bottom">0.35<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">cosine<hr></hr>
</td>
<td align="center" valign="bottom">24.9<hr></hr>
</td>
<td align="center" valign="bottom">168<hr></hr>
</td>
<td align="center" valign="bottom">391<hr></hr>
</td>
<td align="center" valign="bottom">0.43<hr></hr>
</td>
<td align="center" valign="bottom">0.32<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">ST<hr></hr>
</td>
<td align="center" valign="bottom">28.0<hr></hr>
</td>
<td align="center" valign="bottom">160<hr></hr>
</td>
<td align="center" valign="bottom">375<hr></hr>
</td>
<td align="center" valign="bottom">0.43<hr></hr>
</td>
<td align="center" valign="bottom">0.30<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">shallow<hr></hr>
</td>
<td align="center" valign="bottom">24.6<hr></hr>
</td>
<td align="center" valign="bottom">136<hr></hr>
</td>
<td align="center" valign="bottom">393<hr></hr>
</td>
<td align="center" valign="bottom">0.35<hr></hr>
</td>
<td align="center" valign="bottom">0.26<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">kBSPS<hr></hr>
</td>
<td align="center" valign="bottom">36.6<hr></hr>
</td>
<td align="center" valign="bottom">122<hr></hr>
</td>
<td align="center" valign="bottom">330<hr></hr>
</td>
<td align="center" valign="bottom">0.37<hr></hr>
</td>
<td align="center" valign="bottom">0.23<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">combined<hr></hr>
</td>
<td align="center" valign="bottom">24.8<hr></hr>
</td>
<td align="center" valign="bottom">117<hr></hr>
</td>
<td align="center" valign="bottom">392<hr></hr>
</td>
<td align="center" valign="bottom">0.30<hr></hr>
</td>
<td align="center" valign="bottom">0.22<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SL<hr></hr>
</td>
<td align="center" valign="bottom">30.4<hr></hr>
</td>
<td align="center" valign="bottom">116<hr></hr>
</td>
<td align="center" valign="bottom">363<hr></hr>
</td>
<td align="center" valign="bottom">0.32<hr></hr>
</td>
<td align="center" valign="bottom">0.22<hr></hr>
</td>
</tr>
<tr><td align="left">SpT</td>
<td align="center">46.4</td>
<td align="center">88</td>
<td align="center">279</td>
<td align="center">0.32</td>
<td align="center">0.17</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Classification results on the 521 ND pairs with CV evaluation (in decreasing order according to the number of successfully classified pairs). Ratio (<italic>r</italic>
) refers to the distribution of positive classes predicted by the kernel measured across the 5 corpora; TN is the number of correctly classified ND pairs; <italic>e</italic>
 is 521·(1−<italic>r</italic>
), the expected number of negative class predictions projected onto the 521 ND pairs.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T5"><label>Table 5</label>
<caption><p>Classification results on the 521 ND pairs with CL evaluation</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top"><tr><th align="left"><bold>Kernel</bold>
</th>
<th align="center"><bold><italic>r</italic>
</bold>
</th>
<th align="center"><bold>TN</bold>
</th>
<th align="center"><bold><italic>e</italic>
</bold>
</th>
<th align="center"><bold>TN/</bold>
<bold><italic>e</italic>
</bold>
</th>
<th align="center"><bold>TN/#ND</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">SST<hr></hr>
</td>
<td align="center" valign="bottom">26.9<hr></hr>
</td>
<td align="center" valign="bottom">288<hr></hr>
</td>
<td align="center" valign="bottom">381<hr></hr>
</td>
<td align="center" valign="bottom">0.76<hr></hr>
</td>
<td align="center" valign="bottom">0.55<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">edit<hr></hr>
</td>
<td align="center" valign="bottom">22.5<hr></hr>
</td>
<td align="center" valign="bottom">279<hr></hr>
</td>
<td align="center" valign="bottom">404<hr></hr>
</td>
<td align="center" valign="bottom">0.69<hr></hr>
</td>
<td align="center" valign="bottom">0.54<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">ST<hr></hr>
</td>
<td align="center" valign="bottom">29.2<hr></hr>
</td>
<td align="center" valign="bottom">231<hr></hr>
</td>
<td align="center" valign="bottom">369<hr></hr>
</td>
<td align="center" valign="bottom">0.63<hr></hr>
</td>
<td align="center" valign="bottom">0.44<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG<hr></hr>
</td>
<td align="center" valign="bottom">26.9<hr></hr>
</td>
<td align="center" valign="bottom">207<hr></hr>
</td>
<td align="center" valign="bottom">381<hr></hr>
</td>
<td align="center" valign="bottom">0.54<hr></hr>
</td>
<td align="center" valign="bottom">0.40<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SL<hr></hr>
</td>
<td align="center" valign="bottom">29.9<hr></hr>
</td>
<td align="center" valign="bottom">177<hr></hr>
</td>
<td align="center" valign="bottom">365<hr></hr>
</td>
<td align="center" valign="bottom">0.48<hr></hr>
</td>
<td align="center" valign="bottom">0.34<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">lexical<hr></hr>
</td>
<td align="center" valign="bottom">24.5<hr></hr>
</td>
<td align="center" valign="bottom">170<hr></hr>
</td>
<td align="center" valign="bottom">393<hr></hr>
</td>
<td align="center" valign="bottom">0.43<hr></hr>
</td>
<td align="center" valign="bottom">0.33<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">cosine<hr></hr>
</td>
<td align="center" valign="bottom">26.6<hr></hr>
</td>
<td align="center" valign="bottom">157<hr></hr>
</td>
<td align="center" valign="bottom">382<hr></hr>
</td>
<td align="center" valign="bottom">0.41<hr></hr>
</td>
<td align="center" valign="bottom">0.30<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">syntactic<hr></hr>
</td>
<td align="center" valign="bottom">26.9<hr></hr>
</td>
<td align="center" valign="bottom">155<hr></hr>
</td>
<td align="center" valign="bottom">381<hr></hr>
</td>
<td align="center" valign="bottom">0.41<hr></hr>
</td>
<td align="center" valign="bottom">0.30<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SpT<hr></hr>
</td>
<td align="center" valign="bottom">42.1<hr></hr>
</td>
<td align="center" valign="bottom">142<hr></hr>
</td>
<td align="center" valign="bottom">302<hr></hr>
</td>
<td align="center" valign="bottom">0.47<hr></hr>
</td>
<td align="center" valign="bottom">0.27<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">combined<hr></hr>
</td>
<td align="center" valign="bottom">26.8<hr></hr>
</td>
<td align="center" valign="bottom">132<hr></hr>
</td>
<td align="center" valign="bottom">381<hr></hr>
</td>
<td align="center" valign="bottom">0.35<hr></hr>
</td>
<td align="center" valign="bottom">0.25<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">shallow<hr></hr>
</td>
<td align="center" valign="bottom">28.6<hr></hr>
</td>
<td align="center" valign="bottom">127<hr></hr>
</td>
<td align="center" valign="bottom">372<hr></hr>
</td>
<td align="center" valign="bottom">0.34<hr></hr>
</td>
<td align="center" valign="bottom">0.24<hr></hr>
</td>
</tr>
<tr><td align="left">kBSPS</td>
<td align="center">37.1</td>
<td align="center">120</td>
<td align="center">328</td>
<td align="center">0.37</td>
<td align="center">0.23</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Classification results on the 521 ND pairs with CL evaluation (in decreasing order according to the number of successfully classified pairs). Ratio (<italic>r</italic>
) refers to the distribution of positive classes predicted by the kernel measured across the 5 corpora; TN is the number of correctly classified ND pairs; <italic>e</italic>
 is 521·(1−<italic>r</italic>
), the expected number of negative class predictions projected onto the 521 ND pairs.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T6"><label>Table 6</label>
<caption><p>Classification results on the 190 PD pairs with CV evaluation</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top"><tr><th align="left"><bold>Kernel</bold>
</th>
<th align="center"><bold><italic>r</italic>
</bold>
</th>
<th align="center"><bold>TP</bold>
</th>
<th align="center"><bold><italic>e</italic>
</bold>
</th>
<th align="center"><bold>TP/e</bold>
</th>
<th align="center"><bold>TP/#PD</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">SpT<hr></hr>
</td>
<td align="center" valign="bottom">46.4<hr></hr>
</td>
<td align="center" valign="bottom">71<hr></hr>
</td>
<td align="center" valign="bottom">88<hr></hr>
</td>
<td align="center" valign="bottom">0.81<hr></hr>
</td>
<td align="center" valign="bottom">0.37<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">PT<hr></hr>
</td>
<td align="center" valign="bottom">27.9<hr></hr>
</td>
<td align="center" valign="bottom">33<hr></hr>
</td>
<td align="center" valign="bottom">53<hr></hr>
</td>
<td align="center" valign="bottom">0.62<hr></hr>
</td>
<td align="center" valign="bottom">0.17<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">kBSPS<hr></hr>
</td>
<td align="center" valign="bottom">36.6<hr></hr>
</td>
<td align="center" valign="bottom">22<hr></hr>
</td>
<td align="center" valign="bottom">70<hr></hr>
</td>
<td align="center" valign="bottom">0.31<hr></hr>
</td>
<td align="center" valign="bottom">0.12<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">ST<hr></hr>
</td>
<td align="center" valign="bottom">28.0<hr></hr>
</td>
<td align="center" valign="bottom">19<hr></hr>
</td>
<td align="center" valign="bottom">53<hr></hr>
</td>
<td align="center" valign="bottom">0.36<hr></hr>
</td>
<td align="center" valign="bottom">0.10<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SST<hr></hr>
</td>
<td align="center" valign="bottom">26.6<hr></hr>
</td>
<td align="center" valign="bottom">16<hr></hr>
</td>
<td align="center" valign="bottom">51<hr></hr>
</td>
<td align="center" valign="bottom">0.31<hr></hr>
</td>
<td align="center" valign="bottom">0.08<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG<hr></hr>
</td>
<td align="center" valign="bottom">25.3<hr></hr>
</td>
<td align="center" valign="bottom">15<hr></hr>
</td>
<td align="center" valign="bottom">48<hr></hr>
</td>
<td align="center" valign="bottom">0.31<hr></hr>
</td>
<td align="center" valign="bottom">0.08<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SL<hr></hr>
</td>
<td align="center" valign="bottom">30.4<hr></hr>
</td>
<td align="center" valign="bottom">15<hr></hr>
</td>
<td align="center" valign="bottom">58<hr></hr>
</td>
<td align="center" valign="bottom">0.26<hr></hr>
</td>
<td align="center" valign="bottom">0.08<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">syntactic<hr></hr>
</td>
<td align="center" valign="bottom">24.4<hr></hr>
</td>
<td align="center" valign="bottom">14<hr></hr>
</td>
<td align="center" valign="bottom">46<hr></hr>
</td>
<td align="center" valign="bottom">0.30<hr></hr>
</td>
<td align="center" valign="bottom">0.07<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">edit<hr></hr>
</td>
<td align="center" valign="bottom">18.1<hr></hr>
</td>
<td align="center" valign="bottom">11<hr></hr>
</td>
<td align="center" valign="bottom">34<hr></hr>
</td>
<td align="center" valign="bottom">0.32<hr></hr>
</td>
<td align="center" valign="bottom">0.06<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">lexical<hr></hr>
</td>
<td align="center" valign="bottom">25.0<hr></hr>
</td>
<td align="center" valign="bottom">9<hr></hr>
</td>
<td align="center" valign="bottom">47<hr></hr>
</td>
<td align="center" valign="bottom">0.19<hr></hr>
</td>
<td align="center" valign="bottom">0.05<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">shallow<hr></hr>
</td>
<td align="center" valign="bottom">24.6<hr></hr>
</td>
<td align="center" valign="bottom">7<hr></hr>
</td>
<td align="center" valign="bottom">47<hr></hr>
</td>
<td align="center" valign="bottom">0.15<hr></hr>
</td>
<td align="center" valign="bottom">0.04<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">cosine<hr></hr>
</td>
<td align="center" valign="bottom">24.9<hr></hr>
</td>
<td align="center" valign="bottom">7<hr></hr>
</td>
<td align="center" valign="bottom">47<hr></hr>
</td>
<td align="center" valign="bottom">0.15<hr></hr>
</td>
<td align="center" valign="bottom">0.04<hr></hr>
</td>
</tr>
<tr><td align="left">combined</td>
<td align="center">24.8</td>
<td align="center">4</td>
<td align="center">47</td>
<td align="center">0.09</td>
<td align="center">0.02</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Classification results on the 190 PD pairs with CV evaluation (in decreasing order according to the number of successfully classified pairs). Ratio (<italic>r</italic>
) refers to the distribution of positive classes predicted by the kernel measured across the 5 corpora; TP is the number of correctly classified PD pairs; <italic>e</italic>
 is 190·<italic>r</italic>
, the expected number of negative class predictions projected onto the 190 PD pairs.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T7"><label>Table 7</label>
<caption><p>Classification results on the 190 PD pairs with CL evaluation</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top"><tr><th align="left"><bold>Kernel</bold>
</th>
<th align="center"><bold><italic>r</italic>
</bold>
</th>
<th align="center"><bold>TP</bold>
</th>
<th align="center"><bold><italic>e</italic>
</bold>
</th>
<th align="center"><bold>TP/e</bold>
</th>
<th align="center"><bold>TP/#PD</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">SpT<hr></hr>
</td>
<td align="center" valign="bottom">42.1<hr></hr>
</td>
<td align="center" valign="bottom">53<hr></hr>
</td>
<td align="center" valign="bottom">80<hr></hr>
</td>
<td align="center" valign="bottom">0.66<hr></hr>
</td>
<td align="center" valign="bottom">0.28<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SST<hr></hr>
</td>
<td align="center" valign="bottom">26.9<hr></hr>
</td>
<td align="center" valign="bottom">39<hr></hr>
</td>
<td align="center" valign="bottom">51<hr></hr>
</td>
<td align="center" valign="bottom">0.76<hr></hr>
</td>
<td align="center" valign="bottom">0.21<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">ST<hr></hr>
</td>
<td align="center" valign="bottom">29.2<hr></hr>
</td>
<td align="center" valign="bottom">28<hr></hr>
</td>
<td align="center" valign="bottom">55<hr></hr>
</td>
<td align="center" valign="bottom">0.51<hr></hr>
</td>
<td align="center" valign="bottom">0.15<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SL<hr></hr>
</td>
<td align="center" valign="bottom">29.9<hr></hr>
</td>
<td align="center" valign="bottom">27<hr></hr>
</td>
<td align="center" valign="bottom">57<hr></hr>
</td>
<td align="center" valign="bottom">0.47<hr></hr>
</td>
<td align="center" valign="bottom">0.14<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">combined<hr></hr>
</td>
<td align="center" valign="bottom">26.8<hr></hr>
</td>
<td align="center" valign="bottom">16<hr></hr>
</td>
<td align="center" valign="bottom">51<hr></hr>
</td>
<td align="center" valign="bottom">0.31<hr></hr>
</td>
<td align="center" valign="bottom">0.08<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">shallow<hr></hr>
</td>
<td align="center" valign="bottom">28.6<hr></hr>
</td>
<td align="center" valign="bottom">14<hr></hr>
</td>
<td align="center" valign="bottom">54<hr></hr>
</td>
<td align="center" valign="bottom">0.26<hr></hr>
</td>
<td align="center" valign="bottom">0.07<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">kBSPS<hr></hr>
</td>
<td align="center" valign="bottom">37.1<hr></hr>
</td>
<td align="center" valign="bottom">14<hr></hr>
</td>
<td align="center" valign="bottom">70<hr></hr>
</td>
<td align="center" valign="bottom">0.20<hr></hr>
</td>
<td align="center" valign="bottom">0.07<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG<hr></hr>
</td>
<td align="center" valign="bottom">26.9<hr></hr>
</td>
<td align="center" valign="bottom">9<hr></hr>
</td>
<td align="center" valign="bottom">51<hr></hr>
</td>
<td align="center" valign="bottom">0.18<hr></hr>
</td>
<td align="center" valign="bottom">0.05<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">edit<hr></hr>
</td>
<td align="center" valign="bottom">22.5<hr></hr>
</td>
<td align="center" valign="bottom">7<hr></hr>
</td>
<td align="center" valign="bottom">43<hr></hr>
</td>
<td align="center" valign="bottom">0.16<hr></hr>
</td>
<td align="center" valign="bottom">0.04<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">cosine<hr></hr>
</td>
<td align="center" valign="bottom">26.6<hr></hr>
</td>
<td align="center" valign="bottom">4<hr></hr>
</td>
<td align="center" valign="bottom">51<hr></hr>
</td>
<td align="center" valign="bottom">0.08<hr></hr>
</td>
<td align="center" valign="bottom">0.02<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">syntactic<hr></hr>
</td>
<td align="center" valign="bottom">26.9<hr></hr>
</td>
<td align="center" valign="bottom">2<hr></hr>
</td>
<td align="center" valign="bottom">51<hr></hr>
</td>
<td align="center" valign="bottom">0.04<hr></hr>
</td>
<td align="center" valign="bottom">0.01<hr></hr>
</td>
</tr>
<tr><td align="left">lexical</td>
<td align="center">24.5</td>
<td align="center">1</td>
<td align="center">47</td>
<td align="center">0.02</td>
<td align="center">0.01</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Classification results on the 190 PD pairs with CL evaluation (in decreasing order according to the number of successfully classified pairs). Ratio (<italic>r</italic>
) refers to the distribution of positive classes predicted by the kernel measured across the 5 corpora; TP is the number of correctly classified PD pairs; <italic>e</italic>
 is 190·<italic>r</italic>
, the expected number of negative class predictions projected onto the 190 PD pairs.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T8"><label>Table 8</label>
<caption><p>Classification results on the 1510 NE pairs with CV evaluation</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="right"></col>
</colgroup>
<thead valign="top"><tr><th align="left"><bold>Kernel</bold>
</th>
<th align="center"><bold><italic>r</italic>
</bold>
</th>
<th align="center"><bold>TN</bold>
</th>
<th align="center"><bold>FN</bold>
</th>
<th align="right"><bold><italic>e</italic>
</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">APG<hr></hr>
</td>
<td align="center" valign="bottom">25.3<hr></hr>
</td>
<td align="center" valign="bottom">1510<hr></hr>
</td>
<td align="center" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">1129<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">cosine<hr></hr>
</td>
<td align="center" valign="bottom">24.9<hr></hr>
</td>
<td align="center" valign="bottom">1510<hr></hr>
</td>
<td align="center" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">1134<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">edit<hr></hr>
</td>
<td align="center" valign="bottom">18.1<hr></hr>
</td>
<td align="center" valign="bottom">1510<hr></hr>
</td>
<td align="center" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">1237<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">combined<hr></hr>
</td>
<td align="center" valign="bottom">24.8<hr></hr>
</td>
<td align="center" valign="bottom">1510<hr></hr>
</td>
<td align="center" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">1135<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">shallow<hr></hr>
</td>
<td align="center" valign="bottom">24.6<hr></hr>
</td>
<td align="center" valign="bottom">1510<hr></hr>
</td>
<td align="center" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">1138<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">syntactic<hr></hr>
</td>
<td align="center" valign="bottom">24.4<hr></hr>
</td>
<td align="center" valign="bottom">1510<hr></hr>
</td>
<td align="center" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">1142<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">kBSPS<hr></hr>
</td>
<td align="center" valign="bottom">36.6<hr></hr>
</td>
<td align="center" valign="bottom">1509<hr></hr>
</td>
<td align="center" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">957<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SL<hr></hr>
</td>
<td align="center" valign="bottom">30.4<hr></hr>
</td>
<td align="center" valign="bottom">1508<hr></hr>
</td>
<td align="center" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">1051<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">lexical<hr></hr>
</td>
<td align="center" valign="bottom">25.0<hr></hr>
</td>
<td align="center" valign="bottom">1506<hr></hr>
</td>
<td align="center" valign="bottom">4<hr></hr>
</td>
<td align="right" valign="bottom">1133<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">PT<hr></hr>
</td>
<td align="center" valign="bottom">27.9<hr></hr>
</td>
<td align="center" valign="bottom">1505<hr></hr>
</td>
<td align="center" valign="bottom">5<hr></hr>
</td>
<td align="right" valign="bottom">1089<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">ST<hr></hr>
</td>
<td align="center" valign="bottom">28.0<hr></hr>
</td>
<td align="center" valign="bottom">1502<hr></hr>
</td>
<td align="center" valign="bottom">8<hr></hr>
</td>
<td align="right" valign="bottom">1088<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SST<hr></hr>
</td>
<td align="center" valign="bottom">26.6<hr></hr>
</td>
<td align="center" valign="bottom">1501<hr></hr>
</td>
<td align="center" valign="bottom">9<hr></hr>
</td>
<td align="right" valign="bottom">1108<hr></hr>
</td>
</tr>
<tr><td align="left">SpT</td>
<td align="center">46.4</td>
<td align="center">1484</td>
<td align="center">26</td>
<td align="right">810</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Classification results on the 1510 NE pairs with CV evaluation (in decreasing order according to the successfully classified pairs). Ratio (<italic>r</italic>
) refers to the distribution of positive classes predicted by the kernel measured across the 5 corpora; TN/FN is the number of correctly/incorrectly classified NE pairs; <italic>e</italic>
 is 1510·(1−<italic>r</italic>
), the expected number of negative class prediction projected onto the 1510 NE pairs.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T9"><label>Table 9</label>
<caption><p>Classification results on the 1510 NE pairs with CL evaluation</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="right"></col>
<col align="right"></col>
</colgroup>
<thead valign="top"><tr><th align="left"><bold>Kernel</bold>
</th>
<th align="center"><bold><italic>r</italic>
</bold>
</th>
<th align="center"><bold>TN</bold>
</th>
<th align="right"><bold>FN</bold>
</th>
<th align="right"><bold><italic>e</italic>
</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">shallow<hr></hr>
</td>
<td align="center" valign="bottom">28.6<hr></hr>
</td>
<td align="center" valign="bottom">1510<hr></hr>
</td>
<td align="right" valign="bottom">0<hr></hr>
</td>
<td align="right" valign="bottom">1078<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">combined<hr></hr>
</td>
<td align="center" valign="bottom">26.8<hr></hr>
</td>
<td align="center" valign="bottom">1505<hr></hr>
</td>
<td align="right" valign="bottom">5<hr></hr>
</td>
<td align="right" valign="bottom">1105<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG<hr></hr>
</td>
<td align="center" valign="bottom">26.9<hr></hr>
</td>
<td align="center" valign="bottom">1504<hr></hr>
</td>
<td align="right" valign="bottom">6<hr></hr>
</td>
<td align="right" valign="bottom">1104<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SL<hr></hr>
</td>
<td align="center" valign="bottom">29.9<hr></hr>
</td>
<td align="center" valign="bottom">1504<hr></hr>
</td>
<td align="right" valign="bottom">6<hr></hr>
</td>
<td align="right" valign="bottom">1059<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">lexical<hr></hr>
</td>
<td align="center" valign="bottom">24.5<hr></hr>
</td>
<td align="center" valign="bottom">1501<hr></hr>
</td>
<td align="right" valign="bottom">9<hr></hr>
</td>
<td align="right" valign="bottom">1140<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">kBSPS<hr></hr>
</td>
<td align="center" valign="bottom">37.1<hr></hr>
</td>
<td align="center" valign="bottom">1494<hr></hr>
</td>
<td align="right" valign="bottom">16<hr></hr>
</td>
<td align="right" valign="bottom">950<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">edit<hr></hr>
</td>
<td align="center" valign="bottom">22.5<hr></hr>
</td>
<td align="center" valign="bottom">1491<hr></hr>
</td>
<td align="right" valign="bottom">19<hr></hr>
</td>
<td align="right" valign="bottom">1171<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">cosine<hr></hr>
</td>
<td align="center" valign="bottom">26.6<hr></hr>
</td>
<td align="center" valign="bottom">1490<hr></hr>
</td>
<td align="right" valign="bottom">20<hr></hr>
</td>
<td align="right" valign="bottom">1109<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">ST<hr></hr>
</td>
<td align="center" valign="bottom">29.2<hr></hr>
</td>
<td align="center" valign="bottom">1489<hr></hr>
</td>
<td align="right" valign="bottom">21<hr></hr>
</td>
<td align="right" valign="bottom">1069<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SST<hr></hr>
</td>
<td align="center" valign="bottom">26.9<hr></hr>
</td>
<td align="center" valign="bottom">1484<hr></hr>
</td>
<td align="right" valign="bottom">26<hr></hr>
</td>
<td align="right" valign="bottom">1104<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">syntactic<hr></hr>
</td>
<td align="center" valign="bottom">26.9<hr></hr>
</td>
<td align="center" valign="bottom">1483<hr></hr>
</td>
<td align="right" valign="bottom">27<hr></hr>
</td>
<td align="right" valign="bottom">1103<hr></hr>
</td>
</tr>
<tr><td align="left">SpT</td>
<td align="center">42.1</td>
<td align="center">1429</td>
<td align="right">81</td>
<td align="right">874</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Classification results on the 1510 NE pairs with CL evaluation (in decreasing order according to the successfully classified pairs). Ratio (<italic>r</italic>
) refers to the distribution of positive classes predicted by the kernel measured across the 5 corpora; TN/FN is the number of correctly/incorrectly classified NE pairs; <italic>e</italic>
 is 1510·(1−<italic>r</italic>
), the expected number of negative class prediction projected onto the 1510 NE pairs.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T10"><label>Table 10</label>
<caption><p>Classification results on the 219 PE pairs with CV evaluation</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="right"></col>
<col align="right"></col>
</colgroup>
<thead valign="top"><tr><th align="left"><bold>Kernel</bold>
</th>
<th align="center"><bold><italic>r</italic>
</bold>
</th>
<th align="center"><bold>TP</bold>
</th>
<th align="right"><bold>FP</bold>
</th>
<th align="right"><bold><italic>e</italic>
</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">combined<hr></hr>
</td>
<td align="center" valign="bottom">24.8<hr></hr>
</td>
<td align="center" valign="bottom">218<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">54<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG<hr></hr>
</td>
<td align="center" valign="bottom">25.3<hr></hr>
</td>
<td align="center" valign="bottom">218<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">55<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SpT<hr></hr>
</td>
<td align="center" valign="bottom">46.4<hr></hr>
</td>
<td align="center" valign="bottom">218<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">102<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">kBSPS<hr></hr>
</td>
<td align="center" valign="bottom">36.6<hr></hr>
</td>
<td align="center" valign="bottom">217<hr></hr>
</td>
<td align="right" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">80<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SL<hr></hr>
</td>
<td align="center" valign="bottom">30.4<hr></hr>
</td>
<td align="center" valign="bottom">216<hr></hr>
</td>
<td align="right" valign="bottom">3<hr></hr>
</td>
<td align="right" valign="bottom">67<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">shallow<hr></hr>
</td>
<td align="center" valign="bottom">24.6<hr></hr>
</td>
<td align="center" valign="bottom">213<hr></hr>
</td>
<td align="right" valign="bottom">6<hr></hr>
</td>
<td align="right" valign="bottom">54<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">PT<hr></hr>
</td>
<td align="center" valign="bottom">27.9<hr></hr>
</td>
<td align="center" valign="bottom">210<hr></hr>
</td>
<td align="right" valign="bottom">9<hr></hr>
</td>
<td align="right" valign="bottom">61<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">syntactic<hr></hr>
</td>
<td align="center" valign="bottom">24.4<hr></hr>
</td>
<td align="center" valign="bottom">208<hr></hr>
</td>
<td align="right" valign="bottom">11<hr></hr>
</td>
<td align="right" valign="bottom">53<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">cosine<hr></hr>
</td>
<td align="center" valign="bottom">24.9<hr></hr>
</td>
<td align="center" valign="bottom">206<hr></hr>
</td>
<td align="right" valign="bottom">13<hr></hr>
</td>
<td align="right" valign="bottom">55<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">ST<hr></hr>
</td>
<td align="center" valign="bottom">28.0<hr></hr>
</td>
<td align="center" valign="bottom">205<hr></hr>
</td>
<td align="right" valign="bottom">14<hr></hr>
</td>
<td align="right" valign="bottom">61<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">lexical<hr></hr>
</td>
<td align="center" valign="bottom">25.0<hr></hr>
</td>
<td align="center" valign="bottom">204<hr></hr>
</td>
<td align="right" valign="bottom">15<hr></hr>
</td>
<td align="right" valign="bottom">55<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SST<hr></hr>
</td>
<td align="center" valign="bottom">26.6<hr></hr>
</td>
<td align="center" valign="bottom">201<hr></hr>
</td>
<td align="right" valign="bottom">18<hr></hr>
</td>
<td align="right" valign="bottom">58<hr></hr>
</td>
</tr>
<tr><td align="left">edit</td>
<td align="center">18.1</td>
<td align="center">192</td>
<td align="right">27</td>
<td align="right">40</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Classification results on the 219 PE pairs with CV evaluation (in decreasing order according to the successfully classified pairs). Ratio (<italic>r</italic>
) refers to the distribution of positive classes predicted by the kernel measured across the 5 corpora; TP/FP is the number of correctly/incorrectly classified PE pairs; <italic>e</italic>
 is 219·<italic>r</italic>
, the expected number of positive class prediction projected onto the 219 PE pairs.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T11"><label>Table 11</label>
<caption><p>Classification results on the 219 PE pairs with CL evaluation</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="right"></col>
<col align="right"></col>
</colgroup>
<thead valign="top"><tr><th align="left"><bold>Kernel</bold>
</th>
<th align="center"><bold><italic>r</italic>
</bold>
</th>
<th align="center"><bold>TP</bold>
</th>
<th align="right"><bold>FP</bold>
</th>
<th align="right"><bold><italic>e</italic>
</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">kBSPS<hr></hr>
</td>
<td align="center" valign="bottom">37.1<hr></hr>
</td>
<td align="center" valign="bottom">218<hr></hr>
</td>
<td align="right" valign="bottom">1<hr></hr>
</td>
<td align="right" valign="bottom">81<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">combined<hr></hr>
</td>
<td align="center" valign="bottom">26.8<hr></hr>
</td>
<td align="center" valign="bottom">217<hr></hr>
</td>
<td align="right" valign="bottom">2<hr></hr>
</td>
<td align="right" valign="bottom">59<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">shallow<hr></hr>
</td>
<td align="center" valign="bottom">28.6<hr></hr>
</td>
<td align="center" valign="bottom">205<hr></hr>
</td>
<td align="right" valign="bottom">14<hr></hr>
</td>
<td align="right" valign="bottom">63<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SL<hr></hr>
</td>
<td align="center" valign="bottom">29.9<hr></hr>
</td>
<td align="center" valign="bottom">202<hr></hr>
</td>
<td align="right" valign="bottom">17<hr></hr>
</td>
<td align="right" valign="bottom">65<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">syntactic<hr></hr>
</td>
<td align="center" valign="bottom">26.9<hr></hr>
</td>
<td align="center" valign="bottom">202<hr></hr>
</td>
<td align="right" valign="bottom">17<hr></hr>
</td>
<td align="right" valign="bottom">59<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">lexical<hr></hr>
</td>
<td align="center" valign="bottom">24.5<hr></hr>
</td>
<td align="center" valign="bottom">196<hr></hr>
</td>
<td align="right" valign="bottom">23<hr></hr>
</td>
<td align="right" valign="bottom">54<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG<hr></hr>
</td>
<td align="center" valign="bottom">26.9<hr></hr>
</td>
<td align="center" valign="bottom">194<hr></hr>
</td>
<td align="right" valign="bottom">25<hr></hr>
</td>
<td align="right" valign="bottom">59<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">cosine<hr></hr>
</td>
<td align="center" valign="bottom">26.6<hr></hr>
</td>
<td align="center" valign="bottom">181<hr></hr>
</td>
<td align="right" valign="bottom">38<hr></hr>
</td>
<td align="right" valign="bottom">58<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">SpT<hr></hr>
</td>
<td align="center" valign="bottom">42.1<hr></hr>
</td>
<td align="center" valign="bottom">177<hr></hr>
</td>
<td align="right" valign="bottom">42<hr></hr>
</td>
<td align="right" valign="bottom">92<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">edit<hr></hr>
</td>
<td align="center" valign="bottom">22.5<hr></hr>
</td>
<td align="center" valign="bottom">154<hr></hr>
</td>
<td align="right" valign="bottom">65<hr></hr>
</td>
<td align="right" valign="bottom">49<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">ST<hr></hr>
</td>
<td align="center" valign="bottom">29.2<hr></hr>
</td>
<td align="center" valign="bottom">126<hr></hr>
</td>
<td align="right" valign="bottom">93<hr></hr>
</td>
<td align="right" valign="bottom">64<hr></hr>
</td>
</tr>
<tr><td align="left">SST</td>
<td align="center">26.9</td>
<td align="center">123</td>
<td align="right">96</td>
<td align="right">59</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Classification results on the 219 PE pairs with CL evaluation (in decreasing order according to the successfully classified pairs). Ratio (<italic>r</italic>
) refers to the distribution of positive classes predicted by the kernel measured across the 5 corpora; TP/FP is the number of correctly/incorrectly classified PE pairs; <italic>e</italic>
 is 219·<italic>r</italic>
, the expected number of positive class prediction projected onto the 219 PE pairs.</p>
</table-wrap-foot>
</table-wrap>
<p>On difficult pairs (ND&PD), the measured number of true negatives (TN) is smaller than expected based on the class distribution of kernels’ prediction. This phenomenon can be attributed to the difficulty of pairs. The same tendency can be observed for easy pairs (PE&NE) in the opposite direction.</p>
<p>The difference in performance between CV and CL settings reported in [<xref ref-type="bibr" rid="B14">14</xref>
] cannot be observed on ND pairs: kernels tend to create more general models in the CL setting and identify ND pairs with greater success in average. For PD pairs, kernels produce equally low results in both settings. On the other hand, kernels perform far better for easy pairs (both PE&NE) in CV than in CL setting. This shows that the more general CL models do not work so well on easy pairs than the rather corpus specific CV models; that is, the smaller variability in training examples is also reflected in performance of the learnt model.</p>
<p>As for individual kernels, edit kernel shows the best performance for ND pairs both in terms of TNs and relative to its expected performance. This can be attributed to the low probability of the positive class in edit’s prediction, which is also manifested in the below average performance on positive pairs (PD&PE), and the very good results on NE pairs. SpT, which exhibits by far the highest positive class ratio, performs relatively well on PD pairs both in terms of FNs and the expected relative performance (esp. at CV); this kernel shows analog performance pattern on PD and NE pairs. As for the top performing kernels (APG, SL, kBSPS; [<xref ref-type="bibr" rid="B14">14</xref>
]) APG performs on all pair subsets equally well (above average or among the best), except at CL on positive pairs; SL is always above the average, except at CV on NDs; however kBSPS works particularly well on easy pairs, and pretty bad on difficult ones (esp. on NDs).</p>
<p>We observed that for difficult (D) pairs, some kernels perform equally better independently of the class label: SST (CL and CV) and ST (CL only). However, this advantage cannot be easily exploited unless difficult pairs are identified in advance. Therefore, next we investigate whether difficulty classes can be predicted by observing only obvious surface features.</p>
</sec>
<sec><title>Relation between sentence length, entity distance and pair difficulty</title>
<p>In Figure <xref ref-type="fig" rid="F4">4</xref>
 we show the characteristics of sentence difficulty in terms of the average length of the sentence, the average distance between entities, and the size of the shortest path in parse tree. It can be observed that positives pairs are more difficult to classify in longer, and negative pairs in shorter sentences. This correlates with the average length of sentences with positive/negative pairs being 27.6 and 37.2 words – these numbers coincide with the average length of neutral sentences. This is also in accordance with the distribution of positive and negative pairs in terms of the sentence length. Positive pairs occur more often in shorter sentences with less proteins (see Figures <xref ref-type="fig" rid="F5">5</xref>
 and <xref ref-type="fig" rid="F6">6</xref>
), and most of the analyzed classifiers fail to capture the characteristics of rare positive pairs in longer sentences. Long sentences have typically more complicated sentence structure, thus deep parsers are also prone to produce more erroneous parses, which makes the PPI relation extraction task especially difficult.</p>
<fig id="F4" position="float"><label>Figure 4</label>
<caption><p><bold>Characteristics of pairs by difficulty class.</bold>
 Characteristics of pairs by difficulty class (average sentence length in words, average word distance between entities, average distance in the dependency graph (DG) and syntax tree (ST) shortest path). ND – negative difficult, NN – negative neutral, NE – negative easy, PD – positive difficult, PN – positive neutral, PE – positive easy.</p>
</caption>
<graphic xlink:href="1471-2105-14-12-4"></graphic>
</fig>
<fig id="F5" position="float"><label>Figure 5</label>
<caption><p>The number of positive and negative pairs vs. the length of the sentence containing the pair.</p>
</caption>
<graphic xlink:href="1471-2105-14-12-5"></graphic>
</fig>
<fig id="F6" position="float"><label>Figure 6</label>
<caption><p>The positive ground truth rate vs. the length of the sentence containing the pair.</p>
</caption>
<graphic xlink:href="1471-2105-14-12-6"></graphic>
</fig>
<p>The distance in words between entities in a sentence seems to be more independent from the difficulty of the pair (see Figure <xref ref-type="fig" rid="F6">6</xref>
). The entities in NE pairs are closer to each other than neutral or more difficult ones, while for positive pairs no such tendency can be observed: the distance in both PE and PD pairs are shorter than at neutral ones. On the other hand, one can observe also at this level that entities of negative pairs are further (9.67) from each others than positives ones (7.15). On the dependency tree level, the difference has a smaller extent: 4.56 (negative) and 4.15 (positive).</p>
<p>We conclude that according to all the three distance measures (word, dependency tree, syntax distance), the farther the entities of negative pairs are located the more difficult are to classify. We also found that positive pairs are typically closer than negative pairs.</p>
<p>Note that similar characteristics were observed at the BioNLP’09 event extraction task regarding the size of minimal subgraph of the dependency graph that includes all triggers and arguments. It was shown in [<xref ref-type="bibr" rid="B33">33</xref>
] that the size of this subgraph correlates with the class of the event: positive instances are present typically in smaller subgraphs. For the same dataset, in [<xref ref-type="bibr" rid="B34">34</xref>
] it is shown that the distance between trigger and potential arguments is much smaller for positive than for negative instances.</p>
<p>Next we looked into the relationship between pair difficulty and number of entities in a sentence. In general, long sentences have more protein mentions, and the number of pairs increases quadratically with the number of mentions. We investigated the class distribution of pairs depending on the number of proteins in the sentence (see Figure <xref ref-type="fig" rid="F7">7</xref>
). We can see that the more protein mentions a sentence exhibits, the lower the ratio of positive pairs. This is consistent with the previous experiment on PD pairs: in long sentences there are only few positive pairs, and those are difficult to classify.</p>
<fig id="F7" position="float"><label>Figure 7</label>
<caption><p>Class distribution of pairs depending on the number of proteins in the sentence.</p>
</caption>
<graphic xlink:href="1471-2105-14-12-7"></graphic>
</fig>
<p>Finally, to predict the difficulty class of pairs based on their surface features, we applied a decision tree classifier, results shown in Table <xref ref-type="table" rid="T12">12</xref>
. We found that predicting the difficult (D) class is particularly hard, with a recall of 20.8 and an F-score of 28.2, indicating that difficult pairs share very few characteristics.</p>
<table-wrap position="float" id="T12"><label>Table 12</label>
<caption><p>Classification of difficulty classes based on pair surface features by decision tree</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
</colgroup>
<thead valign="top"><tr><th align="left" valign="bottom"> <hr></hr>
</th>
<th colspan="3" align="center" valign="bottom"><bold>Performance</bold>
<hr></hr>
</th>
<th colspan="3" align="center" valign="bottom"><bold>Confusion matrix</bold>
<hr></hr>
</th>
<td> </td>
</tr>
<tr><th align="left"><bold>Difficulty class</bold>
</th>
<th align="center"><bold>P</bold>
</th>
<th align="center"><bold>R</bold>
</th>
<th align="center"><bold>F</bold>
<sub><bold>1</bold>
</sub>
</th>
<th align="right"><bold>D</bold>
</th>
<th align="right"><bold>N</bold>
</th>
<th align="right"><bold>E</bold>
</th>
<th align="right"><bold>Total</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">difficult (D)<hr></hr>
</td>
<td align="center" valign="bottom">43.5<hr></hr>
</td>
<td align="center" valign="bottom">20.8<hr></hr>
</td>
<td align="center" valign="bottom">28.2<hr></hr>
</td>
<td align="right" valign="bottom">148<hr></hr>
</td>
<td align="right" valign="bottom">543<hr></hr>
</td>
<td align="right" valign="bottom">20<hr></hr>
</td>
<td align="right" valign="bottom">711<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">neutral (N)<hr></hr>
</td>
<td align="center" valign="bottom">92.0<hr></hr>
</td>
<td align="center" valign="bottom">96.2<hr></hr>
</td>
<td align="center" valign="bottom">94.1<hr></hr>
</td>
<td align="right" valign="bottom">178<hr></hr>
</td>
<td align="right" valign="bottom">14 090<hr></hr>
</td>
<td align="right" valign="bottom">372<hr></hr>
</td>
<td align="right" valign="bottom">14 640<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">easy (E)<hr></hr>
</td>
<td align="center" valign="bottom">72.6<hr></hr>
</td>
<td align="center" valign="bottom">60.0<hr></hr>
</td>
<td align="center" valign="bottom">65.7<hr></hr>
</td>
<td align="right" valign="bottom">14<hr></hr>
</td>
<td align="right" valign="bottom">678<hr></hr>
</td>
<td align="right" valign="bottom">1 037<hr></hr>
</td>
<td align="right" valign="bottom">1 729<hr></hr>
</td>
</tr>
<tr><td align="left">Total</td>
<td align="center">88.0</td>
<td align="center">89.4</td>
<td align="center">88.5</td>
<td align="right"><italic>N/A</italic>
</td>
<td align="right"><italic>N/A</italic>
</td>
<td align="right"><italic>N/A</italic>
</td>
<td align="right"> </td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Classification by the Weka <monospace>J48</monospace>
 classifier. Confusion matrix columns correspond to predicted classes.</p>
</table-wrap-foot>
</table-wrap>
<p>Still, we found a number of correlations between pair difficulty and simple surface features that cannot be exploited in kernels as they use a different feature set. Later on, we will show that such features already suffice to build a classifier that is almost <italic>on par</italic>
 with the state-of-the-art, without using any sophisticated (and costly to compute) kernels.</p>
</sec>
<sec><title>Semantic errors in annotation</title>
<p>For some of the very hardest pairs (60 PD and 60 ND), we manually investigated whether their difficulty is actually caused by annotation errors. We identified 23 PD and 28 ND pairs that we considered as incorrectly annotated (for the list of the pair identifiers, see Table <xref ref-type="table" rid="T13">13</xref>
). Since the selection was drawn from the most difficult pairs, the relatively large number of incorrect annotations does not necessarily make the entire experimentation invalid, though raises the issue of the necessity of a possible re-annotation (see also [<xref ref-type="bibr" rid="B35">35</xref>
]).</p>
<table-wrap position="float" id="T13"><label>Table 13</label>
<caption><p>Incorrectly annotated protein pairs selected from the very hardest positive and negative pairs</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="left"></col>
</colgroup>
<thead valign="top"><tr><th align="left"><bold>Pair ID</bold>
</th>
<th align="center"><bold>GT</bold>
</th>
<th align="center"><bold>Type of error</bold>
</th>
<th align="left"><bold>Sentence text</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">B.d267.s0.p14<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">indirect<hr></hr>
</td>
<td align="left" valign="bottom">However, a number of mammalian DNA repair proteins lack NLS clusters; these proteins include ERCC1, ERCC2 (XPD), mouse RAD51, and the <inline-formula><mml:math id="M1" name="1471-2105-14-12-i1" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">HHR23B</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
/p58 and <inline-formula><mml:math id="M2" name="1471-2105-14-12-i2" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">HHR23B</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 subunits of XPC.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d418.s0.p0<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">Membranous staining and concomitant cytoplasmic localization of E-cadherin, <inline-formula><mml:math id="M3" name="1471-2105-14-12-i3" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">alpha-catenin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and <inline-formula><mml:math id="M4" name="1471-2105-14-12-i4" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">gamma-catenin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 were seen in one case with abnormal beta-catenin immunoreactivity.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d418.s0.p1<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">Membranous staining and concomitant cytoplasmic localization of <inline-formula><mml:math id="M5" name="1471-2105-14-12-i5" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">E-cadherin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, alpha-catenin and <inline-formula><mml:math id="M6" name="1471-2105-14-12-i6" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">gamma-catenin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 were seen in one case with abnormal beta-catenin immunoreactivity.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d506.s0.p8<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">enumeration<hr></hr>
</td>
<td align="left" valign="bottom">Quantitation of the appearance of X22 banding in primary cultures of myotubes indicates that it precedes that of other myofibrillar proteins and that assembly takes place in the following order: <inline-formula><mml:math id="M7" name="1471-2105-14-12-i7" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">X22</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, titin, <inline-formula><mml:math id="M8" name="1471-2105-14-12-i8" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">myosin heavy chain</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, actin, and desmin.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d833.s0.p15<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">Within 1 hour of raising the concentration of calcium ions, integrins, cadherins, alpha-catenin, beta-catenin, plakoglobin, vinculin and alpha-actinin appeared to accumulate at cell-cell borders, whereas the focal contact proteins, <inline-formula><mml:math id="M9" name="1471-2105-14-12-i9" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">paxillin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and <inline-formula><mml:math id="M10" name="1471-2105-14-12-i10" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">talin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, did not.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d833.s0.p14<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">Within 1 hour of raising the concentration of calcium ions, <inline-formula><mml:math id="M11" name="1471-2105-14-12-i11" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">integrins</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, cadherins, <inline-formula><mml:math id="M12" name="1471-2105-14-12-i12" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">alpha-catenin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, beta-catenin, plakoglobin, vinculin and alpha-actinin appeared to accumulate at cell-cell borders, whereas the focal contact proteins, paxillin and talin, did not.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d594.s0.p0<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">The clone contains an open reading frame of 139 amino acid residues which shows greater than 40% sequence identity in a 91 amino acid overlap to animal actin-depolymerizing factors (<inline-formula><mml:math id="M13" name="1471-2105-14-12-i13" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">ADF</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
), <inline-formula><mml:math id="M14" name="1471-2105-14-12-i14" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">cofilin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and destrin.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d296.s2.p20<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">In normal livers, E-cad, <inline-formula><mml:math id="M15" name="1471-2105-14-12-i15" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">alpha-catenin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and beta-catenin, but not CD44s, CD44v5, <inline-formula><mml:math id="M16" name="1471-2105-14-12-i16" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">CD44v6</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, CD44v7-8, and CD44v10, were expressed at the cell membrane of normal intrahepatic bile ducts.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d296.s2.p25<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">In normal livers, E-cad, <inline-formula><mml:math id="M17" name="1471-2105-14-12-i17" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">alpha-catenin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and beta-catenin, but not CD44s, <inline-formula><mml:math id="M18" name="1471-2105-14-12-i18" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">CD44v5</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, CD44v6, CD44v7-8, and CD44v10, were expressed at the cell membrane of normal intrahepatic bile ducts.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d541.s0.p0<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">Since both <inline-formula><mml:math id="M19" name="1471-2105-14-12-i19" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">caldesmon</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and <inline-formula><mml:math id="M20" name="1471-2105-14-12-i20" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">profilin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 have been found enriched in ruffling membranes of animal cells, our in vitro findings may be relevant to the regulation of actin filaments in living cells.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d546.s0.p20<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">Specific antibodies to <inline-formula><mml:math id="M21" name="1471-2105-14-12-i21" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">myosin heavy chain</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 isoforms (SM1, SM2, <inline-formula><mml:math id="M22" name="1471-2105-14-12-i22" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">SMemb</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
), caldesmon, and alpha-smooth muscle actin and cDNAs for SMemb were used.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d28.s234.p1<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">coreference<hr></hr>
</td>
<td align="left" valign="bottom">We have identified a new TNF-related ligand, designated human <inline-formula><mml:math id="M23" name="1471-2105-14-12-i23" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">GITR</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 ligand (<inline-formula><mml:math id="M24" name="1471-2105-14-12-i24" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">hGITRL</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
), and its human receptor (hGITR), an ortholog of the recently discovered murine glucocorticoid-induced TNFR-related (mGITR) protein [4].<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d765.s0.p14<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">enumeration<hr></hr>
</td>
<td align="left" valign="bottom">To determine the relationship between cell cycle regulation and differentiation, the spatiotemporal expression of cyclin A, cyclin B1, cyclin D1, the <inline-formula><mml:math id="M25" name="1471-2105-14-12-i25" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">cyclin-dependent kinase inhibitors</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 (CKIs) p27 and <inline-formula><mml:math id="M26" name="1471-2105-14-12-i26" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">p57</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, and markers of differentiating podocytes in developing human kidneys was investigated by immunohistochemistry.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d296.s2.p23<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">In normal livers, <inline-formula><mml:math id="M27" name="1471-2105-14-12-i27" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">E-cad</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, <inline-formula><mml:math id="M28" name="1471-2105-14-12-i28" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">alpha-catenin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and beta-catenin, but not CD44s, CD44v5, CD44v6, CD44v7-8, and CD44v10, were expressed at the cell membrane of normal intrahepatic bile ducts.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d267.s0.p18<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">indirect<hr></hr>
</td>
<td align="left" valign="bottom">However, a number of mammalian DNA repair proteins lack NLS clusters; these proteins include ERCC1, ERCC2 (XPD), mouse RAD51, and the HHR23B/<inline-formula><mml:math id="M29" name="1471-2105-14-12-i29" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">p58</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and <inline-formula><mml:math id="M30" name="1471-2105-14-12-i30" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">HHR23A</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 subunits of XPC.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d833.s0.p35<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">Within 1 hour of raising the concentration of calcium ions, <inline-formula><mml:math id="M31" name="1471-2105-14-12-i31" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">integrins</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, <inline-formula><mml:math id="M32" name="1471-2105-14-12-i32" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">cadherins</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, alpha-catenin, beta-catenin, plakoglobin, vinculin and alpha-actinin appeared to accumulate at cell-cell borders, whereas the focal contact proteins, paxillin and talin, did not.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d765.s0.p10<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">enumeration<hr></hr>
</td>
<td align="left" valign="bottom">To determine the relationship between cell cycle regulation and differentiation, the spatiotemporal expression of cyclin A, cyclin B1, cyclin D1, the cyclin-dependent kinase inhibitors (<inline-formula><mml:math id="M33" name="1471-2105-14-12-i33" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">CKIs</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
) p27 and <inline-formula><mml:math id="M34" name="1471-2105-14-12-i34" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">p57</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, and markers of differentiating podocytes in developing human kidneys was investigated by immunohistochemistry.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d833.s0.p34<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">Within 1 hour of raising the concentration of calcium ions, <inline-formula><mml:math id="M35" name="1471-2105-14-12-i35" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">integrins</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, cadherins, alpha-catenin, beta-catenin, plakoglobin, <inline-formula><mml:math id="M36" name="1471-2105-14-12-i36" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">vinculin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and alpha-actinin appeared to accumulate at cell-cell borders, whereas the focal contact proteins, paxillin and talin, did not.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d506.s0.p4<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">enumeration<hr></hr>
</td>
<td align="left" valign="bottom">Quantitation of the appearance of X22 banding in primary cultures of myotubes indicates that it precedes that of other myofibrillar proteins and that assembly takes place in the following order: X22, titin, <inline-formula><mml:math id="M37" name="1471-2105-14-12-i37" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">myosin heavy chain</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, <inline-formula><mml:math id="M38" name="1471-2105-14-12-i38" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">actin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, and desmin.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d833.s0.p7<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">Within 1 hour of raising the concentration of calcium ions, <inline-formula><mml:math id="M39" name="1471-2105-14-12-i39" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">integrins</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, cadherins, alpha-catenin, <inline-formula><mml:math id="M40" name="1471-2105-14-12-i40" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">beta-catenin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, plakoglobin, vinculin and alpha-actinin appeared to accumulate at cell-cell borders, whereas the focal contact proteins, paxillin and talin, did not.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d506.s0.p11<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">enumeration<hr></hr>
</td>
<td align="left" valign="bottom">Quantitation of the appearance of X22 banding in primary cultures of myotubes indicates that it precedes that of other myofibrillar proteins and that assembly takes place in the following order: X22, <bold>titin</bold>
<inline-formula><mml:math id="M41" name="1471-2105-14-12-i41" overflow="scroll"><mml:msub><mml:mrow></mml:mrow>
<mml:mrow><mml:mtext mathvariant="italic">EN</mml:mtext>
<mml:msub><mml:mrow><mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, <inline-formula><mml:math id="M42" name="1471-2105-14-12-i42" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">myosin heavy chain</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, actin, and desmin.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d833.s0.p29<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">Within 1 hour of raising the concentration of calcium ions, <inline-formula><mml:math id="M43" name="1471-2105-14-12-i43" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">integrins</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, cadherins, alpha-catenin, beta-catenin, <inline-formula><mml:math id="M44" name="1471-2105-14-12-i44" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">plakoglobin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, vinculin and alpha-actinin appeared to accumulate at cell-cell borders, whereas the focal contact proteins, paxillin and talin, did not.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d833.s0.p32<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="center" valign="bottom">functional<hr></hr>
</td>
<td align="left" valign="bottom">Within 1 hour of raising the concentration of calcium ions, <inline-formula><mml:math id="M45" name="1471-2105-14-12-i45" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">integrins</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, cadherins, alpha-catenin, beta-catenin, plakoglobin, vinculin and <inline-formula><mml:math id="M46" name="1471-2105-14-12-i46" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">alpha-actinin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 appeared to accumulate at cell-cell borders, whereas the focal contact proteins, paxillin and talin, did not.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d60.s528.p0<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">The <inline-formula><mml:math id="M47" name="1471-2105-14-12-i47" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">v-Raf</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 proteins purified from cells infected with EC12 or 22W viruses activated <inline-formula><mml:math id="M48" name="1471-2105-14-12-i48" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">MAP kinase</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 kinase from skeletal muscle in vitro.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d180.s0.p0<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom"><disp-formula><mml:math id="M49" name="1471-2105-14-12-i49" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">DR3</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</disp-formula>
 signal transduction is mediated by a complex of intracellular signaling molecules including <inline-formula><mml:math id="M50" name="1471-2105-14-12-i50" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">TRADD</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, TRAF2, FADD, and FLICE.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d114.s961.p0<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom"><disp-formula><mml:math id="M51" name="1471-2105-14-12-i51" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">Syntrophin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</disp-formula>
 binds to an alternatively spliced exon of <inline-formula><mml:math id="M52" name="1471-2105-14-12-i52" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">dystrophin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d93.s0.p9<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">Because <inline-formula><mml:math id="M53" name="1471-2105-14-12-i53" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">histone</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 H3 shares many structural features with histone H4 and is intimately associated with <inline-formula><mml:math id="M54" name="1471-2105-14-12-i54" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">H4</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 in the assembled nucleosome, we asked whether H3 has similar functions.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d749.s0.p2<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">Three actin-associated proteins, actin-binding protein, <inline-formula><mml:math id="M55" name="1471-2105-14-12-i55" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">gelsolin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, and profilin, influence gelation, solation, and polymerization, respectively, of <inline-formula><mml:math id="M56" name="1471-2105-14-12-i56" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">actin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 in vitro.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d639.s0.p0<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">The main inhibitory action of p27, a cyclin-dependent kinase inhibitor (<inline-formula><mml:math id="M57" name="1471-2105-14-12-i57" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">CDKI</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
), arises from its binding with the cyclin E/<inline-formula><mml:math id="M58" name="1471-2105-14-12-i58" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">cyclin-dependent kinase 2</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 (Cdk2) complex that results in G(1)-S arrest.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d334.s0.p0<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">In extracts from mouse brain, <inline-formula><mml:math id="M59" name="1471-2105-14-12-i59" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">profilin I</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and profilin II can form complexes with regulators of endocytosis, synaptic vesicle recycling and <inline-formula><mml:math id="M60" name="1471-2105-14-12-i60" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">actin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 assembly.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d141.s1189.p0<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">The cyclin-dependent kinase <inline-formula><mml:math id="M61" name="1471-2105-14-12-i61" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">Cdk2</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 associates with <inline-formula><mml:math id="M62" name="1471-2105-14-12-i62" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">cyclins A</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, D, and E and has been implicated in the control of the G1 to S phase transition in mammals.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d485.s0.p2<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">PF4-dependent downregulation of cyclin E-cdk2 activity was associated with increased binding of the <inline-formula><mml:math id="M63" name="1471-2105-14-12-i63" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">cyclin-dependent kinase inhibitor</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, p21(Cip1/WAF1), to the <inline-formula><mml:math id="M64" name="1471-2105-14-12-i64" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">cyclin E</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
-cdk2 complex.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d157.s1329.p4<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">Deletion analysis and binding studies demonstrate that a third enzyme, protein kinase C (<inline-formula><mml:math id="M65" name="1471-2105-14-12-i65" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">PKC</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
), binds <inline-formula><mml:math id="M66" name="1471-2105-14-12-i66" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">AKAP79</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 at a site distinct from those bound by PKA or CaN.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d60.s529.p0<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">Furthermore, a bacterially expressed <inline-formula><mml:math id="M67" name="1471-2105-14-12-i67" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">v-Raf</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 fusion protein (glutathione S-transferase-p3722W) also activated <inline-formula><mml:math id="M68" name="1471-2105-14-12-i68" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">MAP kinase</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 kinase in vitro.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d199.s1701.p0<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom"><disp-formula><mml:math id="M69" name="1471-2105-14-12-i69" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">Sos</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</disp-formula>
 in complex with a previously identified 90-kDa protein and designated protein <inline-formula><mml:math id="M70" name="1471-2105-14-12-i70" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">80K-H</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d161.s1355.p0<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom"><disp-formula><mml:math id="M71" name="1471-2105-14-12-i71" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">SHPTP2</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</disp-formula>
 associates with the <inline-formula><mml:math id="M72" name="1471-2105-14-12-i72" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">platelet-derived growth factor</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 (PDGF) receptor after ligand stimulation, and binding of SHPTP2 to this receptor promotes tyrosine phosphorylation of SHPTP2.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d357.s0.p1<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom"><disp-formula><mml:math id="M73" name="1471-2105-14-12-i73" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">Integrin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</disp-formula>
 (beta) chains, for example, interact with actin-binding proteins (e.g. <inline-formula><mml:math id="M74" name="1471-2105-14-12-i74" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">talin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and filamin), which form mechanical links to the cytoskeleton.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d195.s1663.p2<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">Intriguingly, NR1-<inline-formula><mml:math id="M75" name="1471-2105-14-12-i75" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">calmodulin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 binding is directly antagonized by Ca2+/<inline-formula><mml:math id="M76" name="1471-2105-14-12-i76" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">alpha-actinin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d151.s1288.p1<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">Immunoprecipitation assays also show a weak substoichiometric association of the <inline-formula><mml:math id="M77" name="1471-2105-14-12-i77" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">TATA-binding protein</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 (TBP) with <inline-formula><mml:math id="M78" name="1471-2105-14-12-i78" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">PTF</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, consistent with the previous report of a PTF-related complex (SNAPc) containing substoichiometric levels of TBP and a component (SNAPc43) identical in sequence to the PTF gamma reported here.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d485.s0.p4<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">PF4-dependent downregulation of cyclin E-cdk2 activity was associated with increased binding of the <inline-formula><mml:math id="M79" name="1471-2105-14-12-i79" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">cyclin-dependent kinase inhibitor</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, p21(Cip1/WAF1), to the cyclin E-<inline-formula><mml:math id="M80" name="1471-2105-14-12-i80" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">cdk2</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 complex.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d814.s0.p26<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">We have shown that the <inline-formula><mml:math id="M81" name="1471-2105-14-12-i81" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">FH proteins</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 Bni1p and Bnr1p are potential targets of the Rho family small GTP-binding proteins and bind to an actin-binding protein, <inline-formula><mml:math id="M82" name="1471-2105-14-12-i82" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">profilin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, at their proline-rich FH1 domains to regulate reorganization of the actin cytoskeleton in the yeast Saccharomyces cerevisiae.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d14.s0.p4<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">T<hr></hr>
</td>
<td align="left" valign="bottom">Actin-binding proteins such as <inline-formula><mml:math id="M83" name="1471-2105-14-12-i83" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">profilin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and gelsolin bind to phosphatidylinositol (PI) 4,5-bisphosphate (PI 4,5-P2) and regulate the concentration of monomeric <inline-formula><mml:math id="M84" name="1471-2105-14-12-i84" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">actin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d39.s340.p0<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">indirect<hr></hr>
</td>
<td align="left" valign="bottom">Chloramphenicol acetyltransferase assays in F9 cells showed that <inline-formula><mml:math id="M85" name="1471-2105-14-12-i85" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">PS1</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 suppresses transactivation by <inline-formula><mml:math id="M86" name="1471-2105-14-12-i86" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">c-Jun</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
/c-Jun but not by c-Jun/c-Fos heterodimers, consistent with the reported function of QM/Jif-1.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d307.s0.p4<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">indirect<hr></hr>
</td>
<td align="left" valign="bottom">In Acanthamoeba <inline-formula><mml:math id="M87" name="1471-2105-14-12-i87" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">actin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 polymerization is regulated, at least in part, by profilin, which binds to <inline-formula><mml:math id="M88" name="1471-2105-14-12-i88" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">actin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 monomers, and by capping protein, which both nucleates polymerization and blocks monomer addition at the ’barbed’ end of the filament.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d35.s4.p9<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">indirect<hr></hr>
</td>
<td align="left" valign="bottom">We conclude that Aip1p is a <inline-formula><mml:math id="M89" name="1471-2105-14-12-i89" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">cofilin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
-associated protein that enhances the filament disassembly activity of cofilin and restricts <inline-formula><mml:math id="M90" name="1471-2105-14-12-i90" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">cofilin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 localization to cortical actin patches.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">L.d35.s1.p1<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">indirect<hr></hr>
</td>
<td align="left" valign="bottom">Our data demonstrate that the <inline-formula><mml:math id="M91" name="1471-2105-14-12-i91" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">CtsR</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 protein acts as a global repressor of the clpC operon, as well as other class III heat shock genes, by preventing unstressed transcription from either the sigmaB- or <inline-formula><mml:math id="M92" name="1471-2105-14-12-i92" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">sigmaA</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
-dependent promoter and might be inactivated or dissociate under inducing stress conditions.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">B.d14.s1.p2<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">indirect<hr></hr>
</td>
<td align="left" valign="bottom">These studies suggest that profilin and <inline-formula><mml:math id="M93" name="1471-2105-14-12-i93" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">gelsolin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 may control the generation of 3-OH phosphorylated phosphoinositides, which in turn may regulate the <inline-formula><mml:math id="M94" name="1471-2105-14-12-i94" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">actin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 polymerization.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">I.d11.s28.p1<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">coreference<hr></hr>
</td>
<td align="left" valign="bottom">The <inline-formula><mml:math id="M95" name="1471-2105-14-12-i95" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">phospholipase C</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 inhibitor U 71322 prevented the activation of phospholipase C by <disp-formula><mml:math id="M96" name="1471-2105-14-12-i96" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">A beta P</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</disp-formula>
<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">L.d13.s0.p1<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">indirect<hr></hr>
</td>
<td align="left" valign="bottom">Production of <inline-formula><mml:math id="M97" name="1471-2105-14-12-i97" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">sigmaK</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 about 1 h earlier than normal does affect Spo0A, which when phosphorylated is an activator of <inline-formula><mml:math id="M98" name="1471-2105-14-12-i98" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">sigE</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 transcription.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">A.d78.s669.p2<hr></hr>
</td>
<td align="center" valign="bottom">F<hr></hr>
</td>
<td align="center" valign="bottom">indirect<hr></hr>
</td>
<td align="left" valign="bottom">Our data suggest that <inline-formula><mml:math id="M99" name="1471-2105-14-12-i99" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">TR6</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 inhibits the interactions of LIGHT with HVEM / / <inline-formula><mml:math id="M100" name="1471-2105-14-12-i100" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">TR2</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 and LTbetaR, thereby suppressing LIGHT-mediated HT29 cell death.<hr></hr>
</td>
</tr>
<tr><td align="left">B.d223.s0.p9</td>
<td align="center">F</td>
<td align="center">functional</td>
<td align="left">Furthermore, the deletion of <inline-formula><mml:math id="M101" name="1471-2105-14-12-i101" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">SJL1</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
 suppresses the temperature-sensitive growth defect of sac6, a mutant in yeast <inline-formula><mml:math id="M102" name="1471-2105-14-12-i102" overflow="scroll"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">fimbrin</mml:mtext>
</mml:mrow>
<mml:mrow><mml:msub><mml:mrow><mml:mtext>ENT</mml:mtext>
</mml:mrow>
<mml:mrow><mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
, supporting a role for synaptojanin family members in actin function.</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Pair id abbreviations: A – AIMed; B – BioInfer; I – IEPA, L – LLL; ground truth (GT): T (true), F (false); type of errors: indirect – no direct interaction between the entities are described; functional – only functional similarity between entities are described; enumeration – entities are just listed together in an enumeration; coreference – the same protein with different referencing. Entities (in the pair) are highlighted with bold typeface.</p>
</table-wrap-foot>
</table-wrap>
<p>We investigated if kernels (we only used APG and SL) could benefit from re-annotation by resetting the ground truth (GT) value of the above 51 pairs and re-running the experiments. Recall that only a mere 0.3% of GT values were changed, most of them in BioInfer (36) and AIMed (12) corpora. We analyzed the performance change both using the original and the re-trained model on the re-annotated corpora (see Table <xref ref-type="table" rid="T14">14</xref>
). We observed a slight performance improvement using the original model (F-score gain 0.2–0.6). With the re-trained model the performance of APG and SL could be further improved on both corpora (F-score gain 0.25–1.0). This shows that the re-annotation of corpora yield performance uplift even if only a small fraction of pairs is concerned.</p>
<table-wrap position="float" id="T14"><label>Table 14</label>
<caption><p>The effect on F-score when changing the ground truth of incorrectly annotated pairs with APG and SL kernels</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
<col align="right"></col>
</colgroup>
<thead valign="top"><tr><th align="left" valign="bottom"> <hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>AIMed</bold>
<hr></hr>
</th>
<th colspan="5" align="center" valign="bottom"><bold>BioInfer</bold>
<hr></hr>
</th>
</tr>
<tr><th align="left"><bold>  Kernel</bold>
</th>
<th align="right"><bold>Original</bold>
</th>
<th align="right"><bold>Modified</bold>
</th>
<th align="right"><bold>Retrained</bold>
</th>
<th align="right"><bold><italic>Δ</italic>
</bold>
<sub><bold>m-o</bold>
</sub>
</th>
<th align="right"><bold><italic>Δ</italic>
</bold>
<sub><bold>r-m</bold>
</sub>
</th>
<th align="right"><bold>Original</bold>
</th>
<th align="right"><bold>Modified</bold>
</th>
<th align="right"><bold>Retrained</bold>
</th>
<th align="right"><bold><italic>Δ</italic>
</bold>
<sub><bold>m-o</bold>
</sub>
</th>
<th align="right"><bold><italic>Δ</italic>
</bold>
<sub><bold>r-m</bold>
</sub>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom">APG (setting A)<hr></hr>
</td>
<td align="right" valign="bottom">56.18<hr></hr>
</td>
<td align="right" valign="bottom">56.61<hr></hr>
</td>
<td align="right" valign="bottom">56.14<hr></hr>
</td>
<td align="right" valign="bottom">0.43<hr></hr>
</td>
<td align="right" valign="bottom">−0.47<hr></hr>
</td>
<td align="right" valign="bottom">60.66<hr></hr>
</td>
<td align="right" valign="bottom">60.87<hr></hr>
</td>
<td align="right" valign="bottom"><bold>61.19</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">0.21<hr></hr>
</td>
<td align="right" valign="bottom">0.32<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG (setting B)<hr></hr>
</td>
<td align="right" valign="bottom">55.29<hr></hr>
</td>
<td align="right" valign="bottom">55.73<hr></hr>
</td>
<td align="right" valign="bottom"><bold>56.72</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">0.44<hr></hr>
</td>
<td align="right" valign="bottom">0.99<hr></hr>
</td>
<td align="right" valign="bottom">60.61<hr></hr>
</td>
<td align="right" valign="bottom">60.83<hr></hr>
</td>
<td align="right" valign="bottom">60.94<hr></hr>
</td>
<td align="right" valign="bottom">0.22<hr></hr>
</td>
<td align="right" valign="bottom">0.11<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG (setting C)<hr></hr>
</td>
<td align="right" valign="bottom">53.20<hr></hr>
</td>
<td align="right" valign="bottom">53.66<hr></hr>
</td>
<td align="right" valign="bottom">53.96<hr></hr>
</td>
<td align="right" valign="bottom">0.46<hr></hr>
</td>
<td align="right" valign="bottom">0.30<hr></hr>
</td>
<td align="right" valign="bottom">59.91<hr></hr>
</td>
<td align="right" valign="bottom">60.36<hr></hr>
</td>
<td align="right" valign="bottom">60.88<hr></hr>
</td>
<td align="right" valign="bottom">0.45<hr></hr>
</td>
<td align="right" valign="bottom">0.52<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG (setting D)<hr></hr>
</td>
<td align="right" valign="bottom">52.30<hr></hr>
</td>
<td align="right" valign="bottom">52.77<hr></hr>
</td>
<td align="right" valign="bottom">52.99<hr></hr>
</td>
<td align="right" valign="bottom">0.47<hr></hr>
</td>
<td align="right" valign="bottom">0.22<hr></hr>
</td>
<td align="right" valign="bottom">59.42<hr></hr>
</td>
<td align="right" valign="bottom">59.90<hr></hr>
</td>
<td align="right" valign="bottom">60.20<hr></hr>
</td>
<td align="right" valign="bottom">0.48<hr></hr>
</td>
<td align="right" valign="bottom">0.30<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG (avg)<hr></hr>
</td>
<td align="right" valign="bottom">54.24<hr></hr>
</td>
<td align="right" valign="bottom">54.69<hr></hr>
</td>
<td align="right" valign="bottom">54.95<hr></hr>
</td>
<td align="right" valign="bottom">0.45<hr></hr>
</td>
<td align="right" valign="bottom">0.26<hr></hr>
</td>
<td align="right" valign="bottom">60.15<hr></hr>
</td>
<td align="right" valign="bottom">60.60<hr></hr>
</td>
<td align="right" valign="bottom">60.80<hr></hr>
</td>
<td align="right" valign="bottom">0.34<hr></hr>
</td>
<td align="right" valign="bottom">0.31<hr></hr>
</td>
</tr>
<tr><td align="left">SL</td>
<td align="right">54.48</td>
<td align="right">55.06</td>
<td align="right"><bold>55.57</bold>
</td>
<td align="right">0.58</td>
<td align="right">0.51</td>
<td align="right">59.99</td>
<td align="right">60.46</td>
<td align="right"><bold>60.71</bold>
</td>
<td align="right">0.47</td>
<td align="right">0.25</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Modified – using the original model with modified ground truth; retrained – results of a model retrained on the modified ground truth; <italic>Δ</italic>
<sub>m-o</sub>
 – difference between modified and original; <italic>Δ</italic>
<sub>r-m</sub>
 – difference between retrained and modified.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec><title>Similarity of kernel methods</title>
<p>Classifier similarity is a key factor when constructing ensemble classifiers. We define the similarity of two kernels as the number of shared annotations versus the total number of annotations. Performing hierarchical clustering with this similarity measure reveals that kernels using the same parsing information group together almost perfectly, i.e., classify pairs much more similarly to each other than to kernels using different parsing information (see Figure <xref ref-type="fig" rid="F8">8</xref>
). Syntax tree based kernels form a clear and separated cluster. Kim’s kernels build a proper sub-cluster within dependency-based kernels. The only kernel that does not use neither dependency nor syntax data, SL, is grouped in the cluster of dependency-based kernels. Interestingly, the outlier in this cluster is kBSPS and not SL. The best two kernels according to [<xref ref-type="bibr" rid="B14">14</xref>
], APG and SL, are the most similar to each other as they agree on 81% of the benchmark pairs.</p>
<fig id="F8" position="float"><label>Figure 8</label>
<caption><p><bold>Similarity of kernels as dendrogram and heat map.</bold>
 Colors below the dendrogram indicate the parsing information used by a kernel. Similarity of kernel outputs ranges from full agreement (red) to 33% disagreement (yellow) on the five benchmark corpora. Clustering is performed with R’s <monospace>hclust</monospace>
 (<ext-link ext-link-type="uri" xlink:href="http://stat.ethz.ch/R-manual/R-devel/library/stats/html/hclust.html">http://stat.ethz.ch/R-manual/R-devel/library/stats/html/hclust.html</ext-link>
).</p>
</caption>
<graphic xlink:href="1471-2105-14-12-8"></graphic>
</fig>
<p>Clearly, such characteristics can be exploited in building ensembles as they allow a rationale choice of base classifiers; we will report on using such a strategy in the discussion.</p>
</sec>
<sec><title>Feature analysis</title>
<p>To assess the importance of the aforementioned features we constructed a feature space representation of all pairs. We derived surface features from sentences and pairs (see Table <xref ref-type="table" rid="T15">15</xref>
), including tokens on the dependency graphs (same holds for dependency trees) and syntax tree shortest path, therefore also incorporating parsing information. We then performed feature selection by information gain using each difficulty class as label. The ten most relevant features of the difficult (D) and easy (E) classes are tabulated in Table <xref ref-type="table" rid="T16">16</xref>
 according to an independent feature analysis. Indicative features of the D-class negatively correlate with the class label: sentence length, the entropy of POS labels along the syntax tree shortest path, number of dependency labels of type <italic>dep</italic>
 (dependent – fall-back dependency label assigned by the Stanford Parser when no specific label could be retrieved), number of proteins in sentence. The importance of feature <italic>dep</italic>
 suggests that pairs in sentences having more specific dependency labels are more difficult to correctly predict. For the E class, the entropy of edge labels in the entire syntax tree and dependency graph, and the sentence length correlate positively, while frequency of <italic>nn</italic>
, <italic>appos</italic>
, <italic>conj_and, dep, det</italic>
, etc. correlate negatively.</p>
<table-wrap position="float" id="T15"><label>Table 15</label>
<caption><p>Surface and parsing features generated from sentence text used for training non-kernel based classifiers</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top"><tr><th align="center"><bold>Feature type</bold>
</th>
<th align="left"><bold>Feature</bold>
</th>
<th align="left"><bold>Example</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="center" valign="bottom">surface<hr></hr>
</td>
<td align="left" valign="bottom">distance (word/char)<hr></hr>
</td>
<td align="left" valign="bottom">sentence length in characters<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">entity distance in words<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">count<hr></hr>
</td>
<td align="left" valign="bottom">number of proteins in sentence<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">negation clues (s/b/w/a)<hr></hr>
</td>
<td align="left" valign="bottom">negation word before entities<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">hedge clues (s/b/w/a)<hr></hr>
</td>
<td align="left" valign="bottom">hedge word after entities<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">enumeration clues (b)<hr></hr>
</td>
<td align="left" valign="bottom">comma between entities<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">interaction word clues (s/b/w/a)<hr></hr>
</td>
<td align="left" valign="bottom">interaction word in sentence<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">entity modifier (a)<hr></hr>
</td>
<td align="left" valign="bottom">-ing word after first entity<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom">parsing<hr></hr>
</td>
<td align="left" valign="bottom">distance (graph)<hr></hr>
</td>
<td align="left" valign="bottom">length of syntax tree shortest path<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">occurrence features (entire graph)<hr></hr>
</td>
<td align="left" valign="bottom">number of <italic>conj</italic>
 constituents in the syntax tree<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">occurrence features (shortest path)<hr></hr>
</td>
<td align="left" valign="bottom">number of <italic>conj</italic>
 constituents along the shortest path in the syntax tree<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">frequency features (entire graph)<hr></hr>
</td>
<td align="left" valign="bottom">relative frequency of <italic>conj</italic>
 labels over the dependency graph<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">frequency features (shortest path)<hr></hr>
</td>
<td align="left" valign="bottom">relative frequency of <italic>conj</italic>
 labels over the shortest path relations<hr></hr>
</td>
</tr>
<tr><td align="center"> </td>
<td align="left">entropy</td>
<td align="left">Kullback–Leibler divergence of constituent types in the entire syntax tree</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Features may refer to both sentence and pair level characteristics. Parsing features were generated from both syntax and dependency parses. Scope of features are typically sentence (s), before entities (b), between entities (w), after entities (a).</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T16"><label>Table 16</label>
<caption><p>The ten most important features related to difficult (D) and easy (E) classes measured by information gain</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="center"></col>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top"><tr><th align="left" valign="bottom"> <hr></hr>
</th>
<th colspan="3" align="center" valign="bottom"><bold>Difficult (D)</bold>
<hr></hr>
</th>
<th colspan="3" align="center" valign="bottom"><bold>Easy (E)</bold>
<hr></hr>
</th>
</tr>
<tr><th align="center"><bold>Rank</bold>
</th>
<th align="left"><bold>  Feature name</bold>
</th>
<th align="center"><bold>±</bold>
</th>
<th align="center"><bold>IG</bold>
</th>
<th align="left"><bold>  Feature name</bold>
</th>
<th align="center"><bold>±</bold>
</th>
<th align="center"><bold>IG</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="center" valign="bottom">1<hr></hr>
</td>
<td align="left" valign="bottom">sentence length (char)<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.0089<hr></hr>
</td>
<td align="left" valign="bottom">label entropy in ST<hr></hr>
</td>
<td align="center" valign="bottom">+<hr></hr>
</td>
<td align="center" valign="bottom">0.110<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom">2<hr></hr>
</td>
<td align="left" valign="bottom">label entropy in ST (SP)<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.0086<hr></hr>
</td>
<td align="left" valign="bottom">sentence length (char)<hr></hr>
</td>
<td align="center" valign="bottom">+<hr></hr>
</td>
<td align="center" valign="bottom">0.090<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom">3<hr></hr>
</td>
<td align="left" valign="bottom"><italic>dep</italic>
 frequency in DG<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.0079<hr></hr>
</td>
<td align="left" valign="bottom">label entropy in DG<hr></hr>
</td>
<td align="center" valign="bottom">+<hr></hr>
</td>
<td align="center" valign="bottom">0.089<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom">4<hr></hr>
</td>
<td align="left" valign="bottom"># of proteins in sentence<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.0078<hr></hr>
</td>
<td align="left" valign="bottom"><italic>nn</italic>
 frequency in DG<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.081<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom">5<hr></hr>
</td>
<td align="left" valign="bottom">sentence length (word)<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.0069<hr></hr>
</td>
<td align="left" valign="bottom"><italic>appos</italic>
 frequency in DG<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.079<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom">6<hr></hr>
</td>
<td align="left" valign="bottom"><italic>conj_and</italic>
 frequency in DG<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.0069<hr></hr>
</td>
<td align="left" valign="bottom"><italic>conj_and</italic>
 frequency in DG<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.076<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom">7<hr></hr>
</td>
<td align="left" valign="bottom"><italic>prep_with</italic>
 frequency in DG<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.0066<hr></hr>
</td>
<td align="left" valign="bottom"><italic>dep</italic>
 frequency in DG<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.073<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom">8<hr></hr>
</td>
<td align="left" valign="bottom"><italic>prep_with</italic>
 occurrence in DG<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.0066<hr></hr>
</td>
<td align="left" valign="bottom"><italic>det</italic>
 frequency in DG<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.069<hr></hr>
</td>
</tr>
<tr><td align="center" valign="bottom">9<hr></hr>
</td>
<td align="left" valign="bottom"><italic>nsubjpass</italic>
 frequency in DG<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.0059<hr></hr>
</td>
<td align="left" valign="bottom"><italic>amod</italic>
 frequency in DG<hr></hr>
</td>
<td align="center" valign="bottom">−<hr></hr>
</td>
<td align="center" valign="bottom">0.063<hr></hr>
</td>
</tr>
<tr><td align="center">10</td>
<td align="left"><italic>prep_in</italic>
 frequency in DG</td>
<td align="center">−</td>
<td align="center">0.0057</td>
<td align="left"><italic>dobj</italic>
 frequency in DG</td>
<td align="center">−</td>
<td align="center">0.062</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>IG – information gain; ST – syntax tree; DG – dependency graph; SP – shortest path. Italic typesetting indicates parsing tree labels. The sign after each feature indicates positive/negative correlation.</p>
</table-wrap-foot>
</table-wrap>
<p>This experiment justifies that pairs in longer sentences may become more distant and more likely to be negative, thus easier to predict. Several dependency labels are correlated with positive pairs thus their absence render the pair easier to classify (as negative).</p>
</sec>
<sec><title>Non-kernel based classifiers</title>
<p>We also compared kernel based classifiers with some linear, non-kernel based classifiers as implemented in Weka [<xref ref-type="bibr" rid="B36">36</xref>
]. We used the surface feature space created for feature analysis (see Table <xref ref-type="table" rid="T15">15</xref>
). We ran experiments with 9 different methods (decision trees (<monospace>J48</monospace>
, <monospace>LADTree</monospace>
, <monospace>RandomForest</monospace>
), <italic>k</italic>
-NN (<monospace>KStar</monospace>
), rule learners (<monospace>JRip</monospace>
, <monospace>PART</monospace>
), Bayesian (<monospace>NaiveBayes</monospace>
, <monospace>BayesNet</monospace>
) and regression methods (<monospace>Logistic</monospace>
).) We found that the best surface-based classifier, BayesNet, is on par with or better than all kernel based classifiers except APG, SL and kBSPS (see Figure <xref ref-type="fig" rid="F9">9</xref>
). On larger corpora, BayesNet attains 43.4 F-score on AIMed and 54.6 on BioInfer which is outperformed only by the above three kernels. On smaller corpora that are easier to classify having more positive examples, the advantage of kernel based approaches shrinks further.</p>
<fig id="F9" position="float"><label>Figure 9</label>
<caption><p><bold>Comparison of some non-kernel based and kernel based classifiers in terms of F-score (CV evaluation).</bold>
 The first 9 are non-kernel based classifiers, the last four are kernel based classifiers.</p>
</caption>
<graphic xlink:href="1471-2105-14-12-9"></graphic>
</fig>
</sec>
</sec>
<sec sec-type="conclusions"><title>Conclusions</title>
<p>In this paper we performed a thorough instance-level comparison of kernel based approaches for binary relation (PPI) extraction on benchmark corpora.</p>
<p>First, we proposed a method for identifying different difficulty classes of protein pairs independently from evaluation setting. Protein interactions are expressed at the linguistic level in diverse ways; its complexity influences the performance of automated methods to classify the pairs correctly. We hypothesized that linguistic complexity of expressing an interaction correlates with classification performance in general, that is, there are PPs on which kernels tend to err independently from the applied evaluation setting (CV or CL). Difficulty classes of PPs were defined based on the success level of kernels in classifying them. We showed that difficulty classes correlate with certain surface features of the pair/the sentence containing the pair, especially word distance, shortest path length between the two proteins in the dependency graph and in the syntax tree. Using these and other surface features, we build linear classifiers that yield results comparable to many of the much more sophisticated kernels. Similar vector space classifiers have been used previously for PPI extraction by [<xref ref-type="bibr" rid="B31">31</xref>
], however, without an in-depth comparison with existing kernels and in a different evaluation setting. These observations suggest that PPI extraction performance depends far more on the feature set than on the similarity function encoded in kernels, and that future research in the field should focus on finding more expressive features rather than more complex kernel functions. However, it also should be noted that using ever larger feature sets requires considerably more computational resources, considerably increasing the runtime, especially for large-scale experiments. Since the size of currently available training corpora do not keep up with the linguistic diversity, we see two alternatives as possible solutions. The first, computationally more economic strategy focuses on decreasing the linguistic variability using graph rewriting rules on the parse level (see, for instance, [<xref ref-type="bibr" rid="B37">37</xref>
,<xref ref-type="bibr" rid="B38">38</xref>
]). Another alternative is to extend available training corpora e.g. by converting certain PPI specific event-level annotations (e.g. regulation, phosphorylation) to PPI annotations in event databases, such as GENIA event data [<xref ref-type="bibr" rid="B39">39</xref>
]. As an existing example, BioInfer originally also contains richer event information and was transformed to a PPI corpus using some simplifying rules [<xref ref-type="bibr" rid="B8">8</xref>
].</p>
<p>Second, we built an ensemble by combining three kernels with a simple majority voting scheme. We chose kBSPS, SL and APG as these show above average results across various evaluation settings, but still exhibit considerable disagreement at the instance level (see Figure <xref ref-type="fig" rid="F8">8</xref>
). Combining them leads to a performance improvement of more than 2 percentage points in F-score over the best member’s performance (see Table <xref ref-type="table" rid="T17">17</xref>
). We also observed a performance increase when combining other kernels, but the results were not on par with that of the better performing kernels, showing that a detailed comparison of kernels in terms of their false positives and false negatives is very helpful for choosing base classifiers for ensembles. Furthermore, we expect that even a higher performance gain can be achieved by employing more sophisticated ensemble construction methods, such as bagging or stacking [<xref ref-type="bibr" rid="B40">40</xref>
,<xref ref-type="bibr" rid="B41">41</xref>
]. An alternative approach by [<xref ref-type="bibr" rid="B42">42</xref>
] was to build a meta-classifier: they classified dependency trees into five different classes depending on the relative position of the verb and the two proteins and learnt a separate classifier for each of these classes.</p>
<table-wrap position="float" id="T17"><label>Table 17</label>
<caption><p>Results of some simple majority vote ensembles and comparison with best single methods in terms of F-score</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top"><tr><th align="left"><bold>Combination</bold>
</th>
<th align="left"><bold>Corpus</bold>
</th>
<th align="center"><bold>P</bold>
</th>
<th align="center"><bold>R</bold>
</th>
<th align="center"><bold>F</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="bottom"><italic>Single best</italic>
<hr></hr>
</td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr><td align="left" valign="bottom">APG<hr></hr>
</td>
<td align="left" valign="bottom">AIMed<hr></hr>
</td>
<td align="center" valign="bottom">59.9<hr></hr>
</td>
<td align="center" valign="bottom">53.6<hr></hr>
</td>
<td align="center" valign="bottom">56.2<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG<hr></hr>
</td>
<td align="left" valign="bottom">BioInfer<hr></hr>
</td>
<td align="center" valign="bottom">60.2<hr></hr>
</td>
<td align="center" valign="bottom">61.3<hr></hr>
</td>
<td align="center" valign="bottom">60.7<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">kBSPS<hr></hr>
</td>
<td align="left" valign="bottom">HPRD50<hr></hr>
</td>
<td align="center" valign="bottom">60.0<hr></hr>
</td>
<td align="center" valign="bottom"><bold>88.4</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">70.2<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG<hr></hr>
</td>
<td align="left" valign="bottom">IEPA<hr></hr>
</td>
<td align="center" valign="bottom">66.6<hr></hr>
</td>
<td align="center" valign="bottom">82.6<hr></hr>
</td>
<td align="center" valign="bottom">73.1<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">kBSPS<hr></hr>
</td>
<td align="left" valign="bottom">LLL<hr></hr>
</td>
<td align="center" valign="bottom">69.9<hr></hr>
</td>
<td align="center" valign="bottom">95.9<hr></hr>
</td>
<td align="center" valign="bottom">79.3<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG+SL+kBSPS<hr></hr>
</td>
<td align="left" valign="bottom">AIMed<hr></hr>
</td>
<td align="center" valign="bottom">58.0<hr></hr>
</td>
<td align="center" valign="bottom"><bold>61.1</bold>
<hr></hr>
</td>
<td align="center" valign="bottom"><bold>58.9</bold>
<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">BioInfer<hr></hr>
</td>
<td align="center" valign="bottom">60.3<hr></hr>
</td>
<td align="center" valign="bottom">66.4<hr></hr>
</td>
<td align="center" valign="bottom"><bold>63.0</bold>
<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">HPRD50<hr></hr>
</td>
<td align="center" valign="bottom">67.6<hr></hr>
</td>
<td align="center" valign="bottom">76.9<hr></hr>
</td>
<td align="center" valign="bottom"><bold>71.4</bold>
<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">IEPA<hr></hr>
</td>
<td align="center" valign="bottom">68.6<hr></hr>
</td>
<td align="center" valign="bottom"><bold>85.3</bold>
<hr></hr>
</td>
<td align="center" valign="bottom"><bold>75.4</bold>
<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">LLL<hr></hr>
</td>
<td align="center" valign="bottom">71.7<hr></hr>
</td>
<td align="center" valign="bottom">94.5<hr></hr>
</td>
<td align="center" valign="bottom">80.0<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">APG+SL+BayesNet<hr></hr>
</td>
<td align="left" valign="bottom">AIMed<hr></hr>
</td>
<td align="center" valign="bottom">55.9<hr></hr>
</td>
<td align="center" valign="bottom">60.3<hr></hr>
</td>
<td align="center" valign="bottom">57.6<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">BioInfer<hr></hr>
</td>
<td align="center" valign="bottom">58.6<hr></hr>
</td>
<td align="center" valign="bottom"><bold>68.8</bold>
<hr></hr>
</td>
<td align="center" valign="bottom"><bold>63.1</bold>
<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">HPRD50<hr></hr>
</td>
<td align="center" valign="bottom"><bold>68.4</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">69.8<hr></hr>
</td>
<td align="center" valign="bottom">67.7<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">IEPA<hr></hr>
</td>
<td align="center" valign="bottom"><bold>71.1</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">79.9<hr></hr>
</td>
<td align="center" valign="bottom">74.5<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">LLL<hr></hr>
</td>
<td align="center" valign="bottom"><bold>74.3</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">92.9<hr></hr>
</td>
<td align="center" valign="bottom"><bold>80.8</bold>
<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom">All 13 kernels<hr></hr>
</td>
<td align="left" valign="bottom">AIMed<hr></hr>
</td>
<td align="center" valign="bottom"><bold>67.5</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">35.8<hr></hr>
</td>
<td align="center" valign="bottom">46.6<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">BioInfer<hr></hr>
</td>
<td align="center" valign="bottom"><bold>61.5</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">56.5<hr></hr>
</td>
<td align="center" valign="bottom">58.7<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">HPRD50<hr></hr>
</td>
<td align="center" valign="bottom">65.4<hr></hr>
</td>
<td align="center" valign="bottom">69.3<hr></hr>
</td>
<td align="center" valign="bottom">66.1<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">IEPA<hr></hr>
</td>
<td align="center" valign="bottom">70.5<hr></hr>
</td>
<td align="center" valign="bottom">78.8<hr></hr>
</td>
<td align="center" valign="bottom">73.7<hr></hr>
</td>
</tr>
<tr><td align="left"> </td>
<td align="left">LLL</td>
<td align="center">69.6</td>
<td align="center"><bold>98.7</bold>
</td>
<td align="center">79.5</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Best values are typeset in bold.</p>
</table-wrap-foot>
</table-wrap>
<p>Third, the identification of difficult protein pairs was found to be highly useful to spot likely incorrect annotations in the benchmark corpora. We deemed 45% of the 120 manually checked difficult pairs to be incorrectly annotated. We also showed that even very few re-annotated pairs (below 1% of total) influence the kernels’ performance: the re-trained models could generalize the information beyond the affected pairs, and showed a systematic performance gain over the original model. Since our method for finding incorrect annotations is fully automatic, it could be used to decrease the workload of curators at corpus revision.</p>
<p>Overall, we showed that 1–2% of PPI instances are misclassified by all the 13 kernels we considered, independent of which evaluation setting (and hence which training set) was used. Vastly more, 19–30% of PPI instances are misclassified by the majority of these kernels. We also showed that, although a number of features correlate with the “difficulty” of instances, simple combinations of those are not able to tell apart true and false protein pairs. These observations lower the hope that novel types of kernels (using the same input representation) will be able to achieve a breakthrough in PPI extraction performance.</p>
<p>We conclude that one should be rather pessimistic in terms of expecting breakthroughs in kernel-based methods to PPI extraction. Current methods do not seem to do very well in capturing the characteristics shared by positive PPI pairs, which must also be attributed to the heterogeneity of the (still very few) available corpora. We see three main possibilities to escape this situation, some of which have already proven successful in other domains or in other extraction tasks (see references below). For all the three directions we provided below examples found among the 120 examined difficult cases.</p>
<p>The first is to switch focus to more specific forms of interactions, such as regulation, phosphorylation, or complex-building [<xref ref-type="bibr" rid="B43">43</xref>
,<xref ref-type="bibr" rid="B44">44</xref>
]. Among the difficult cases it can be observed that incorrectly classified indirect PPIs among the difficult cases (e.g. B.d14.s1.p2, A.d78.s669.p2) tend to be regulatory relationships. As other types of PPIs may be less affected by this issue, the move from generic PPIs to more specific relations should allow for a higher performance for those PPI subtypes. Looking at such more crisply defined problems likely will lead to more homogeneous data and thus raises the chances of classifiers to find the shared characteristics between positive and negative instances, respectively.</p>
<p>Second, we believe that advances could be achieved if methods considered additional background knowledge, for instance by adding them as features of the pair. This encompasses detailed knowledge on the proteins under consideration (like their function, participation in protein families, evolutionary relationships, etc.) and on the semantics of the terms surrounding them. For instance, some false positives pairs were found to contain two proteins that have nearly identical functional properties or that are orthologs. As such co-occurrences are less likely to describe actual interactions, a more informed approach could benefit from taking such aspects into consideration.</p>
<p>Third, pattern-based methods, which are capable of capturing even exotic instances, might be worth looking into again. Even early pattern-based methods are only slightly worse than machine learning approaches [<xref ref-type="bibr" rid="B28">28</xref>
,<xref ref-type="bibr" rid="B45">45</xref>
], although those did not fully leverage advances which the NLP community has made especially in terms of telling apart “good” patterns from bad ones [<xref ref-type="bibr" rid="B46">46</xref>
,<xref ref-type="bibr" rid="B47">47</xref>
]. Many difficult false positives turned out to be misinterpreted linguistic constructs like enumerations and coreferences. Such constructs might be more appropriately dealt with by using linguistic/syntactical patterns. Note, however, that some other pairs found in sentences with such constructs (e.g. B.d765.s0.p10, A.d28.s234.p1) were correctly annotated by all kernel methods in our assessment. Combining intelligent pattern-selecting with semi-supervised methods for pattern generation [<xref ref-type="bibr" rid="B38">38</xref>
,<xref ref-type="bibr" rid="B48">48</xref>
] seems especially promising.</p>
</sec>
<sec><title>Abbreviations</title>
<p>PPI: Protein–protein interaction; SVM: Support vector machine; RLS: Regularized least squares; POS-tag: Part-of-speech tag; NLP: Natural language processing; SL: Shallow linguistic kernel; ST: Subtree kernel; SST: Subset tree kernel; PT: Partial tree kernel; SpT: Spectrum tree kernel; edit: Edit distance kernel; cosine: Cosine similarity kernel; kBSPS: k-band shortest path spectrum kernel; APG: All-paths graph kernel; CV: Cross-validation; CL: Cross-learning; T: True; F: False; GT: Ground truth; TP: True positive; TN: True negative; FP: False positive; FN: False negative; D: Difficult; N: Neutral; E: Easy; ND: Negative difficult; PD: Positive difficult; NE: Negative easy; PE: Positive easy; dep: dependent; nn: noun compound modifier; appos: appositional modifier; conj: conjunct.</p>
</sec>
<sec><title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec><title>Authors’ contributions</title>
<p>Conceived and designed the experiments: DT, IS, UL. Performed the experiments: DT, IS, PT. Analyzed the data: DT, IS, PT. Wrote the paper: DT, IS, PT, UL. All authors read and approved the final manuscript.</p>
</sec>
</body>
<back><sec><title>Acknowledgements</title>
<p>D Tikk was supported by the Alexander von Humboldt Foundation. I Solt was supported by TÁMOP-4.2.2.B-10/1–2010-0009. PT was supported by the German Ministry for Education and Research (BMBF grant no 0315417B). A part of this work was done while D. Tikk was with the Budapest University of Technology and Economics (Hungary).</p>
</sec>
<ref-list><ref id="B1"><mixed-citation publication-type="journal"><name><surname>Blaschke</surname>
<given-names>C</given-names>
</name>
<name><surname>Andrade</surname>
<given-names>MA</given-names>
</name>
<name><surname>Ouzounis</surname>
<given-names>C</given-names>
</name>
<name><surname>Valencia</surname>
<given-names>A</given-names>
</name>
<article-title>Automatic extraction of biological information from scientific text: protein-protein interactions</article-title>
<source>Proc Int Conf Intell Syst Mol Biol</source>
<year>1999</year>
<volume>7</volume>
<fpage>60</fpage>
<lpage>67</lpage>
<pub-id pub-id-type="pmid">10786287</pub-id>
</mixed-citation>
</ref>
<ref id="B2"><mixed-citation publication-type="journal"><name><surname>Ono</surname>
<given-names>T</given-names>
</name>
<name><surname>Hishigaki</surname>
<given-names>H</given-names>
</name>
<name><surname>Tanigami</surname>
<given-names>A</given-names>
</name>
<name><surname>Takagi</surname>
<given-names>T</given-names>
</name>
<article-title>Automated extraction of information on protein–protein interactions from the biological literature</article-title>
<source>Bioinformatics</source>
<year>2001</year>
<volume>17</volume>
<issue>2</issue>
<fpage>155</fpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/17.2.155</pub-id>
<pub-id pub-id-type="pmid">11238071</pub-id>
</mixed-citation>
</ref>
<ref id="B3"><mixed-citation publication-type="journal"><name><surname>Marcotte</surname>
<given-names>EM</given-names>
</name>
<name><surname>Xenarios</surname>
<given-names>I</given-names>
</name>
<name><surname>Eisenberg</surname>
<given-names>D</given-names>
</name>
<article-title>Mining literature for protein–protein interactions</article-title>
<source>Bioinformatics</source>
<year>2001</year>
<volume>17</volume>
<issue>4</issue>
<fpage>359</fpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/17.4.359</pub-id>
<pub-id pub-id-type="pmid">11301305</pub-id>
</mixed-citation>
</ref>
<ref id="B4"><mixed-citation publication-type="journal"><name><surname>Huang</surname>
<given-names>M</given-names>
</name>
<name><surname>Zhu</surname>
<given-names>X</given-names>
</name>
<name><surname>Hao</surname>
<given-names>Y</given-names>
</name>
<name><surname>Payan</surname>
<given-names>DG</given-names>
</name>
<name><surname>Qu</surname>
<given-names>K</given-names>
</name>
<name><surname>Li</surname>
<given-names>M</given-names>
</name>
<article-title>Discovering patterns to extract protein–protein interactions from full texts</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>20</volume>
<issue>18</issue>
<fpage>3604</fpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bth451</pub-id>
<pub-id pub-id-type="pmid">15284092</pub-id>
</mixed-citation>
</ref>
<ref id="B5"><mixed-citation publication-type="journal"><name><surname>Cohen</surname>
<given-names>AM</given-names>
</name>
<name><surname>Hersh</surname>
<given-names>WR</given-names>
</name>
<article-title>A survey of current work in biomedical text mining</article-title>
<source>Brief Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>57</fpage>
<pub-id pub-id-type="doi">10.1093/bib/6.1.57</pub-id>
<pub-id pub-id-type="pmid">15826357</pub-id>
</mixed-citation>
</ref>
<ref id="B6"><mixed-citation publication-type="journal"><name><surname>Krallinger</surname>
<given-names>M</given-names>
</name>
<name><surname>Valencia</surname>
<given-names>A</given-names>
</name>
<name><surname>Hirschman</surname>
<given-names>L</given-names>
</name>
<article-title>Linking genes to literature: text mining, information extraction, and retrieval applications for biology</article-title>
<source>Genome Biol</source>
<year>2008</year>
<volume>9</volume>
<issue>Suppl 2</issue>
<fpage>S8</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2008-9-s2-s8</pub-id>
<pub-id pub-id-type="pmid">18834499</pub-id>
</mixed-citation>
</ref>
<ref id="B7"><mixed-citation publication-type="journal"><name><surname>Zhou</surname>
<given-names>D</given-names>
</name>
<name><surname>He</surname>
<given-names>Y</given-names>
</name>
<article-title>Extracting interactions between proteins from the literature</article-title>
<source>J Biomed Inform</source>
<year>2008</year>
<volume>41</volume>
<issue>2</issue>
<fpage>393</fpage>
<lpage>407</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.jbi.2007.11.008">http://dx.doi.org/10.1016/j.jbi.2007.11.008</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1016/j.jbi.2007.11.008</pub-id>
<pub-id pub-id-type="pmid">18207462</pub-id>
</mixed-citation>
</ref>
<ref id="B8"><mixed-citation publication-type="journal"><name><surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name><surname>Airola</surname>
<given-names>A</given-names>
</name>
<name><surname>Heimonen</surname>
<given-names>J</given-names>
</name>
<name><surname>Björne</surname>
<given-names>J</given-names>
</name>
<name><surname>Ginter</surname>
<given-names>F</given-names>
</name>
<name><surname>Salakoski</surname>
<given-names>T</given-names>
</name>
<article-title>Comparative analysis of five protein-protein interaction corpora</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<issue>Suppl 3</issue>
<fpage>S6</fpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1471-2105-9-S3-S6">http://dx.doi.org/10.1186/1471-2105-9-S3-S6</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1186/1471-2105-9-S3-S6</pub-id>
<pub-id pub-id-type="pmid">18426551</pub-id>
</mixed-citation>
</ref>
<ref id="B9"><mixed-citation publication-type="journal"><name><surname>Sarawagi</surname>
<given-names>S</given-names>
</name>
<article-title>Information extraction</article-title>
<source>Found Trends Databases</source>
<year>2008</year>
<volume>1</volume>
<fpage>261</fpage>
<lpage>377</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://dl.acm.org/citation.cfm?id=1498844.1498845">http://dl.acm.org/citation.cfm?id=1498844.1498845</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B10"><mixed-citation publication-type="other"><name><surname>Haussler</surname>
<given-names>D</given-names>
</name>
<article-title>Convolution kernels on discrete structures</article-title>
<comment>Tech. Rep. UCS-CRL-99-10, University of California at Santa Cruz, Santa Cruz, CA, USA 1999</comment>
</mixed-citation>
</ref>
<ref id="B11"><mixed-citation publication-type="book"><name><surname>Schölkopf</surname>
<given-names>B</given-names>
</name>
<name><surname>Smola</surname>
<given-names>A</given-names>
</name>
<source>Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond</source>
<year>2002</year>
<publisher-name>Cambridge, MA, USA: MIT Press</publisher-name>
</mixed-citation>
</ref>
<ref id="B12"><mixed-citation publication-type="journal"><name><surname>Arighi</surname>
<given-names>C</given-names>
</name>
<name><surname>Lu</surname>
<given-names>Z</given-names>
</name>
<name><surname>Krallinger</surname>
<given-names>M</given-names>
</name>
<name><surname>Cohen</surname>
<given-names>K</given-names>
</name>
<name><surname>Wilbur</surname>
<given-names>W</given-names>
</name>
<name><surname>Valencia</surname>
<given-names>A</given-names>
</name>
<name><surname>Hirschman</surname>
<given-names>L</given-names>
</name>
<name><surname>Wu</surname>
<given-names>C</given-names>
</name>
<article-title>Overview of the BioCreative III workshop</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<issue>Suppl 8</issue>
<fpage>S1</fpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2105/12/S8/S1">http://www.biomedcentral.com/1471-2105/12/S8/S1</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1186/1471-2105-12-S8-S1</pub-id>
<pub-id pub-id-type="pmid">22151647</pub-id>
</mixed-citation>
</ref>
<ref id="B13"><mixed-citation publication-type="other"><name><surname>Kim</surname>
<given-names>JD</given-names>
</name>
<name><surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name><surname>Ohta</surname>
<given-names>T</given-names>
</name>
<name><surname>Bossy</surname>
<given-names>R</given-names>
</name>
<name><surname>Nguyen</surname>
<given-names>N</given-names>
</name>
<name><surname>Tsujii</surname>
<given-names>J</given-names>
</name>
<article-title>Overview of BioNLP shared task 2011</article-title>
<source>Proceedings of the BioNLP Shared Task 2011 Workshop, Association for Computational Linguistics</source>
<year>2011</year>
<fpage>1</fpage>
<lpage>6</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://www.aclweb.org/anthology/W11-1801">http://www.aclweb.org/anthology/W11-1801</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B14"><mixed-citation publication-type="journal"><name><surname>Tikk</surname>
<given-names>D</given-names>
</name>
<name><surname>Thomas</surname>
<given-names>P</given-names>
</name>
<name><surname>Palaga</surname>
<given-names>P</given-names>
</name>
<name><surname>Hakenberg</surname>
<given-names>J</given-names>
</name>
<name><surname>Leser</surname>
<given-names>U</given-names>
</name>
<article-title>A comprehensive benchmark of kernel methods to extract protein–protein interactions from literature</article-title>
<source>PLoS Comput Biol</source>
<year>2010</year>
<volume>6</volume>
<issue>7</issue>
<fpage>e1000837</fpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000837">http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000837</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1000837</pub-id>
<pub-id pub-id-type="pmid">20617200</pub-id>
</mixed-citation>
</ref>
<ref id="B15"><mixed-citation publication-type="journal"><name><surname>Kim</surname>
<given-names>S</given-names>
</name>
<name><surname>Yoon</surname>
<given-names>J</given-names>
</name>
<name><surname>Yang</surname>
<given-names>J</given-names>
</name>
<article-title>Kernel approaches for genic interaction extraction</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<fpage>118</fpage>
<lpage>126</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btm544">http://dx.doi.org/10.1093/bioinformatics/btm544</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btm544</pub-id>
<pub-id pub-id-type="pmid">18003645</pub-id>
</mixed-citation>
</ref>
<ref id="B16"><mixed-citation publication-type="journal"><name><surname>Fayruzov</surname>
<given-names>T</given-names>
</name>
<name><surname>De Cock</surname>
<given-names>M</given-names>
</name>
<name><surname>Cornelis</surname>
<given-names>C</given-names>
</name>
<name><surname>Hoste</surname>
<given-names>V</given-names>
</name>
<article-title>Linguistic feature analysis for protein interaction extraction</article-title>
<source>BMC Bioinformatics</source>
<year>2009</year>
<volume>10</volume>
<fpage>374</fpage>
<comment>[[<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2105/10/374">http://www.biomedcentral.com/1471-2105/10/374</ext-link>
]]</comment>
<pub-id pub-id-type="doi">10.1186/1471-2105-10-374</pub-id>
<pub-id pub-id-type="pmid">19909518</pub-id>
</mixed-citation>
</ref>
<ref id="B17"><mixed-citation publication-type="book"><name><surname>Giuliano</surname>
<given-names>C</given-names>
</name>
<name><surname>Lavelli</surname>
<given-names>A</given-names>
</name>
<name><surname>Romano</surname>
<given-names>L</given-names>
</name>
<article-title>Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature</article-title>
<source>Proc. of the 11st Conf. of the European Chapter of the ACL (EACL’06)</source>
<year>2006</year>
<publisher-name>Trento: The Association for Computer Linguistics</publisher-name>
<fpage>401</fpage>
<lpage>408</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://acl.ldc.upenn.edu/E/E06/E06-1051.pdf">http://acl.ldc.upenn.edu/E/E06/E06-1051.pdf</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B18"><mixed-citation publication-type="book"><name><surname>Vishwanathan</surname>
<given-names>SVN</given-names>
</name>
<name><surname>Smola</surname>
<given-names>AJ</given-names>
</name>
<article-title>Fast kernels for string and tree matching</article-title>
<source>Proc. of Neural Information Processing Systems (NIPS’02)</source>
<year>2002</year>
<publisher-name>Vancouver, BC, Canada</publisher-name>
<fpage>569</fpage>
<lpage>576</lpage>
</mixed-citation>
</ref>
<ref id="B19"><mixed-citation publication-type="book"><name><surname>Collins</surname>
<given-names>M</given-names>
</name>
<name><surname>Duffy</surname>
<given-names>N</given-names>
</name>
<article-title>Convolution kernels for natural language</article-title>
<source>Proc. of Neural Information Processing Systems (NIPS’01)</source>
<year>2001</year>
<publisher-name>Vancouver, BC, Canada</publisher-name>
<fpage>625</fpage>
<lpage>632</lpage>
</mixed-citation>
</ref>
<ref id="B20"><mixed-citation publication-type="book"><name><surname>Moschitti</surname>
<given-names>A</given-names>
</name>
<article-title>Efficient convolution kernels for dependency and constituent syntactic trees</article-title>
<source>Proc. of The 17th European Conf. on Machine Learning</source>
<year>2006</year>
<publisher-name>Berlin, Germany</publisher-name>
<fpage>318</fpage>
<lpage>329</lpage>
</mixed-citation>
</ref>
<ref id="B21"><mixed-citation publication-type="journal"><name><surname>Kuboyama</surname>
<given-names>T</given-names>
</name>
<name><surname>Hirata</surname>
<given-names>K</given-names>
</name>
<name><surname>Kashima</surname>
<given-names>H</given-names>
</name>
<name><surname>Aoki-Kinoshita</surname>
<given-names>KF</given-names>
</name>
<name><surname>Yasuda</surname>
<given-names>H</given-names>
</name>
<article-title>A spectrum tree kernel</article-title>
<source>Inf Media Technol</source>
<year>2007</year>
<volume>2</volume>
<fpage>292</fpage>
<lpage>299</lpage>
</mixed-citation>
</ref>
<ref id="B22"><mixed-citation publication-type="book"><name><surname>Erkan</surname>
<given-names>G</given-names>
</name>
<name><surname>Özgür</surname>
<given-names>A</given-names>
</name>
<name><surname>Radev</surname>
<given-names>DR</given-names>
</name>
<article-title>Semi-supervised classification for extracting protein interaction sentences using dependency parsing</article-title>
<source>Proc. of the 2007 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)</source>
<year>2007</year>
<publisher-name>Prague, Czech Republic</publisher-name>
<fpage>228</fpage>
<lpage>237</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://www.aclweb.org/anthology/D/D07/D07-1024">http://www.aclweb.org/anthology/D/D07/D07-1024</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B23"><mixed-citation publication-type="journal"><name><surname>Airola</surname>
<given-names>A</given-names>
</name>
<name><surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name><surname>Björne</surname>
<given-names>J</given-names>
</name>
<name><surname>Pahikkala</surname>
<given-names>T</given-names>
</name>
<name><surname>Ginter</surname>
<given-names>F</given-names>
</name>
<name><surname>Salakoski</surname>
<given-names>T</given-names>
</name>
<article-title>All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<issue>Suppl 11</issue>
<fpage>S2</fpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1471-2105-9-S11-S2">http://dx.doi.org/10.1186/1471-2105-9-S11-S2</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1186/1471-2105-9-S11-S2</pub-id>
<pub-id pub-id-type="pmid">19025688</pub-id>
</mixed-citation>
</ref>
<ref id="B24"><mixed-citation publication-type="book"><name><surname>Joachims</surname>
<given-names>T</given-names>
</name>
<source>Making Large-Scale Support Vector Machine Learning Practical, Advances in Kernel Methods: Support Vector Learning</source>
<year>1999</year>
<publisher-name>Cambridge, MA: MIT Press</publisher-name>
</mixed-citation>
</ref>
<ref id="B25"><mixed-citation publication-type="other"><name><surname>Chang</surname>
<given-names>CC</given-names>
</name>
<name><surname>Lin</surname>
<given-names>CJ</given-names>
</name>
<article-title>LIBSVM: a library for support vector machines</article-title>
<year>2001</year>
<comment>Software available at, <ext-link ext-link-type="uri" xlink:href="http://www.csie.ntu.edu.tw/~cjlin/libsvm">http://www.csie.ntu.edu.tw/∼cjlin/libsvm</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="B26"><mixed-citation publication-type="journal"><name><surname>Bunescu</surname>
<given-names>R</given-names>
</name>
<name><surname>Ge</surname>
<given-names>R</given-names>
</name>
<name><surname>Kate</surname>
<given-names>RJ</given-names>
</name>
<name><surname>Marcotte</surname>
<given-names>EM</given-names>
</name>
<name><surname>Mooney</surname>
<given-names>RJ</given-names>
</name>
<name><surname>Ramani</surname>
<given-names>AK</given-names>
</name>
<name><surname>Wong</surname>
<given-names>YW</given-names>
</name>
<article-title>Comparative experiments on learning information extractors for proteins and their interactions</article-title>
<source>Artif Intell Med</source>
<year>2005</year>
<volume>33</volume>
<issue>2</issue>
<fpage>139</fpage>
<lpage>155</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.artmed.2004.07.016">http://dx.doi.org/10.1016/j.artmed.2004.07.016</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1016/j.artmed.2004.07.016</pub-id>
<pub-id pub-id-type="pmid">15811782</pub-id>
</mixed-citation>
</ref>
<ref id="B27"><mixed-citation publication-type="journal"><name><surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name><surname>Ginter</surname>
<given-names>F</given-names>
</name>
<name><surname>Heimonen</surname>
<given-names>J</given-names>
</name>
<name><surname>Bjorne</surname>
<given-names>J</given-names>
</name>
<name><surname>Boberg</surname>
<given-names>J</given-names>
</name>
<name><surname>Jarvinen</surname>
<given-names>J</given-names>
</name>
<name><surname>Salakoski</surname>
<given-names>T</given-names>
</name>
<article-title>BioInfer: a corpus for information extraction in the biomedical domain</article-title>
<source>BMC Bioinformatics</source>
<year>2007</year>
<volume>8</volume>
<fpage>50</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-8-50</pub-id>
<pub-id pub-id-type="pmid">17291334</pub-id>
</mixed-citation>
</ref>
<ref id="B28"><mixed-citation publication-type="journal"><name><surname>Fundel</surname>
<given-names>K</given-names>
</name>
<name><surname>Küffner</surname>
<given-names>R</given-names>
</name>
<name><surname>Zimmer</surname>
<given-names>R</given-names>
</name>
<article-title>RelEx – relation extraction using dependency parse trees</article-title>
<source>Bioinformatics</source>
<year>2007</year>
<volume>23</volume>
<issue>3</issue>
<fpage>365</fpage>
<lpage>371</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btl616">http://dx.doi.org/10.1093/bioinformatics/btl616</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btl616</pub-id>
<pub-id pub-id-type="pmid">17142812</pub-id>
</mixed-citation>
</ref>
<ref id="B29"><mixed-citation publication-type="journal"><name><surname>Ding</surname>
<given-names>J</given-names>
</name>
<name><surname>Berleant</surname>
<given-names>D</given-names>
</name>
<name><surname>Nettleton</surname>
<given-names>D</given-names>
</name>
<name><surname>Wurtele</surname>
<given-names>E</given-names>
</name>
<article-title>Mining Medline: abstracts, sentences, or phrases?</article-title>
<source>Pac Symp Biocomput</source>
<year>2002</year>
<volume>7</volume>
<fpage>326</fpage>
<lpage>337</lpage>
<pub-id pub-id-type="pmid">11928487</pub-id>
</mixed-citation>
</ref>
<ref id="B30"><mixed-citation publication-type="book"><name><surname>Nedellec</surname>
<given-names>C</given-names>
</name>
<article-title>Learning language in logic-genic interaction extraction challenge</article-title>
<source>Proc. of the ICML05 workshop: Learning Language in Logic (LLL’05), Volume 18</source>
<year>2005</year>
<publisher-name>Bonn, Germany</publisher-name>
<fpage>97</fpage>
<lpage>99</lpage>
</mixed-citation>
</ref>
<ref id="B31"><mixed-citation publication-type="book"><name><surname>Miwa</surname>
<given-names>M</given-names>
</name>
<name><surname>Sætre</surname>
<given-names>R</given-names>
</name>
<name><surname>Miyao</surname>
<given-names>Y</given-names>
</name>
<name><surname>Tsujii</surname>
<given-names>J</given-names>
</name>
<article-title>A rich feature vector for protein-protein interaction extraction from multiple corpora</article-title>
<source>Proc. of the 2009 Conf. on Empirical Methods in Natural Language Processing (EMNLP’09)</source>
<year>2009</year>
<publisher-name>Stroudsburg: ACL</publisher-name>
<fpage>121</fpage>
<lpage>130</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://portal.acm.org/citation.cfm?id=1699510.1699527">http://portal.acm.org/citation.cfm?id=1699510.1699527</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B32"><mixed-citation publication-type="journal"><name><surname>Kim</surname>
<given-names>S</given-names>
</name>
<name><surname>Yoon</surname>
<given-names>J</given-names>
</name>
<name><surname>Yang</surname>
<given-names>J</given-names>
</name>
<name><surname>Park</surname>
<given-names>S</given-names>
</name>
<article-title>Walk-weighted subsequence kernels for protein-protein interaction extraction</article-title>
<source>BMC Bioinformatics</source>
<year>2010</year>
<volume>11</volume>
<fpage>107</fpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2105/11/107">http://www.biomedcentral.com/1471-2105/11/107</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1186/1471-2105-11-107</pub-id>
<pub-id pub-id-type="pmid">20184736</pub-id>
</mixed-citation>
</ref>
<ref id="B33"><mixed-citation publication-type="journal"><name><surname>Van Landeghem</surname>
<given-names>S</given-names>
</name>
<name><surname>De Baets</surname>
<given-names>B</given-names>
</name>
<name><surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
<name><surname>Saeys</surname>
<given-names>Y</given-names>
</name>
<article-title>High-precision bio-molecular event extraction from text using parallel binary classifiers</article-title>
<source>Comput Intell</source>
<year>2011</year>
<volume>27</volume>
<issue>4</issue>
<fpage>645</fpage>
<lpage>664</lpage>
<pub-id pub-id-type="doi">10.1111/j.1467-8640.2011.00403.x</pub-id>
</mixed-citation>
</ref>
<ref id="B34"><mixed-citation publication-type="journal"><name><surname>Buyko</surname>
<given-names>E</given-names>
</name>
<name><surname>Faessler</surname>
<given-names>E</given-names>
</name>
<name><surname>Wermter</surname>
<given-names>J</given-names>
</name>
<name><surname>Hahn</surname>
<given-names>U</given-names>
</name>
<article-title>Syntactic simplification and semantic enrichment–trimming dependency graphs for event extraction</article-title>
<source>Comput Intell</source>
<year>2011</year>
<volume>27</volume>
<issue>4</issue>
<fpage>610</fpage>
<lpage>644</lpage>
<pub-id pub-id-type="doi">10.1111/j.1467-8640.2011.00402.x</pub-id>
</mixed-citation>
</ref>
<ref id="B35"><mixed-citation publication-type="journal"><name><surname>Cusick</surname>
<given-names>M</given-names>
</name>
<name><surname>Yu</surname>
<given-names>H</given-names>
</name>
<name><surname>Smolyar</surname>
<given-names>A</given-names>
</name>
<name><surname>Venkatesan</surname>
<given-names>K</given-names>
</name>
<name><surname>Carvunis</surname>
<given-names>A</given-names>
</name>
<name><surname>Simonis</surname>
<given-names>N</given-names>
</name>
<name><surname>Rual</surname>
<given-names>J</given-names>
</name>
<name><surname>Borick</surname>
<given-names>H</given-names>
</name>
<name><surname>Braun</surname>
<given-names>P</given-names>
</name>
<name><surname>Dreze</surname>
<given-names>M</given-names>
</name>
<etal></etal>
<article-title>Literature-curated protein interaction datasets</article-title>
<source>Nat Methods</source>
<year>2008</year>
<volume>6</volume>
<fpage>39</fpage>
<lpage>46</lpage>
<pub-id pub-id-type="pmid">19116613</pub-id>
</mixed-citation>
</ref>
<ref id="B36"><mixed-citation publication-type="book"><name><surname>Witten</surname>
<given-names>IH</given-names>
</name>
<name><surname>Frank</surname>
<given-names>E</given-names>
</name>
<source>Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition</source>
<year>2005</year>
<publisher-name>San Francisco: Morgan Kaufmann</publisher-name>
</mixed-citation>
</ref>
<ref id="B37"><mixed-citation publication-type="book"><name><surname>Miwa</surname>
<given-names>M</given-names>
</name>
<name><surname>Pyysalo</surname>
<given-names>S</given-names>
</name>
<name><surname>Hara</surname>
<given-names>T</given-names>
</name>
<name><surname>Tsujii</surname>
<given-names>J</given-names>
</name>
<article-title>Evaluating dependency representations for event extraction</article-title>
<source>Proc. of the 23rd Int. Conf. on Computational Linguistics (Coling’10)</source>
<year>2010</year>
<publisher-name>Beijing, China</publisher-name>
<fpage>779</fpage>
<lpage>787</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://www.aclweb.org/anthology/C10-1088">http://www.aclweb.org/anthology/C10-1088</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B38"><mixed-citation publication-type="book"><name><surname>Thomas</surname>
<given-names>P</given-names>
</name>
<name><surname>Pietschmann</surname>
<given-names>S</given-names>
</name>
<name><surname>Solt</surname>
<given-names>I</given-names>
</name>
<name><surname>Tikk</surname>
<given-names>D</given-names>
</name>
<name><surname>Leser</surname>
<given-names>U</given-names>
</name>
<article-title>Not all links are equal: exploiting dependency types for the extraction of protein-protein interactions from text</article-title>
<source>Proc. of BioNLP’11</source>
<year>2011</year>
<publisher-name>Portland: ACL</publisher-name>
<fpage>1</fpage>
<lpage>9</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://www.aclweb.org/anthology/W11-0201">http://www.aclweb.org/anthology/W11-0201</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B39"><mixed-citation publication-type="journal"><name><surname>Kim</surname>
<given-names>JD</given-names>
</name>
<name><surname>Ohta</surname>
<given-names>JTandTsujii</given-names>
</name>
<article-title>Corpus annotation for mining biomedical events from literature</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<fpage>10</fpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2105/9/10">http://www.biomedcentral.com/1471-2105/9/10</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1186/1471-2105-9-10</pub-id>
<pub-id pub-id-type="pmid">18182099</pub-id>
</mixed-citation>
</ref>
<ref id="B40"><mixed-citation publication-type="journal"><name><surname>Breiman</surname>
<given-names>L</given-names>
</name>
<article-title>Bagging predictors</article-title>
<source>Mach Learn</source>
<year>1996</year>
<volume>24</volume>
<fpage>123</fpage>
<lpage>140</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://portal.acm.org/citation.cfm?id=231986.231989">http://portal.acm.org/citation.cfm?id=231986.231989</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B41"><mixed-citation publication-type="journal"><name><surname>Wolpert</surname>
<given-names>D</given-names>
</name>
<article-title>Stacked generalization</article-title>
<source>Neural Netw</source>
<year>1992</year>
<volume>5</volume>
<issue>2</issue>
<fpage>241</fpage>
<lpage>259</lpage>
<pub-id pub-id-type="doi">10.1016/S0893-6080(05)80023-1</pub-id>
</mixed-citation>
</ref>
<ref id="B42"><mixed-citation publication-type="journal"><name><surname>Bui</surname>
<given-names>QC</given-names>
</name>
<name><surname>Katrenko</surname>
<given-names>S</given-names>
</name>
<name><surname>Sloot</surname>
<given-names>PMA</given-names>
</name>
<article-title>A hybrid approach to extract protein-protein interactions</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<issue>2</issue>
<fpage>259</fpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/content/early/2010/11/08/bioinformatics.btq620.abstract">http://bioinformatics.oxfordjournals.org/content/early/2010/11/08/bioinformatics.btq620.abstract</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq620</pub-id>
<pub-id pub-id-type="pmid">21062765</pub-id>
</mixed-citation>
</ref>
<ref id="B43"><mixed-citation publication-type="journal"><name><surname>Koike</surname>
<given-names>A</given-names>
</name>
<name><surname>Kobayashi</surname>
<given-names>Y</given-names>
</name>
<name><surname>Takagi</surname>
<given-names>T</given-names>
</name>
<article-title>Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource</article-title>
<source>Genome Res</source>
<year>2003</year>
<volume>13</volume>
<issue>6A</issue>
<fpage>1231</fpage>
<lpage>1243</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/pubmed/12799355">http://www.ncbi.nlm.nih.gov/pubmed/12799355</ext-link>
]</comment>
<pub-id pub-id-type="pmid">12799355</pub-id>
</mixed-citation>
</ref>
<ref id="B44"><mixed-citation publication-type="journal"><name><surname>Miwa</surname>
<given-names>M</given-names>
</name>
<name><surname>Saetre</surname>
<given-names>R</given-names>
</name>
<name><surname>Kim</surname>
<given-names>JD</given-names>
</name>
<name><surname>Tsujii</surname>
<given-names>J</given-names>
</name>
<article-title>Event extraction with complex event classification using rich features</article-title>
<source>J Bioinform Comput Biol</source>
<year>2010</year>
<volume>8</volume>
<fpage>131</fpage>
<lpage>146</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/pubmed/20183879">http://www.ncbi.nlm.nih.gov/pubmed/20183879</ext-link>
]</comment>
<pub-id pub-id-type="doi">10.1142/S0219720010004586</pub-id>
<pub-id pub-id-type="pmid">20183879</pub-id>
</mixed-citation>
</ref>
<ref id="B45"><mixed-citation publication-type="journal"><name><surname>Plake</surname>
<given-names>C</given-names>
</name>
<name><surname>Schiemann</surname>
<given-names>T</given-names>
</name>
<name><surname>Pankalla</surname>
<given-names>M</given-names>
</name>
<name><surname>Hakenberg</surname>
<given-names>J</given-names>
</name>
<name><surname>Leser</surname>
<given-names>U</given-names>
</name>
<article-title>AliBaba: PubMed as a graph</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<issue>19</issue>
<fpage>2444</fpage>
<lpage>2445</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btl408</pub-id>
<pub-id pub-id-type="pmid">16870931</pub-id>
</mixed-citation>
</ref>
<ref id="B46"><mixed-citation publication-type="other"><name><surname>Banko</surname>
<given-names>M</given-names>
</name>
<name><surname>Cafarella</surname>
<given-names>MJ</given-names>
</name>
<name><surname>Soderl</surname>
<given-names>S</given-names>
</name>
<name><surname>Broadhead</surname>
<given-names>M</given-names>
</name>
<name><surname>Etzioni</surname>
<given-names>O</given-names>
</name>
<article-title>Open information extraction from the web</article-title>
<source>Proc. of IJCAI’07</source>
<year>2007</year>
<fpage>2670</fpage>
<lpage>2676</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://turing.cs.washington.edu/papers/ijcai07.pdf">http://turing.cs.washington.edu/papers/ijcai07.pdf</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B47"><mixed-citation publication-type="other"><name><surname>Xu</surname>
<given-names>F</given-names>
</name>
<name><surname>Uszkoreit</surname>
<given-names>H</given-names>
</name>
<name><surname>Li</surname>
<given-names>H</given-names>
</name>
<article-title>A seed-driven bottom-up machine learning framework for extracting relations of various complexity</article-title>
<source>ACL’07</source>
<year>2007</year>
<fpage>584</fpage>
<lpage>591</lpage>
<pub-id pub-id-type="pmid">23744907</pub-id>
</mixed-citation>
</ref>
<ref id="B48"><mixed-citation publication-type="book"><name><surname>Liu</surname>
<given-names>H</given-names>
</name>
<name><surname>Komandur</surname>
<given-names>R</given-names>
</name>
<name><surname>Verspoor</surname>
<given-names>K</given-names>
</name>
<article-title>From graphs to events: a subgraph matching approach for information extraction from biomedical text</article-title>
<source>Proc. of BioNLP’11</source>
<year>2011</year>
<publisher-name>Portland, OR, USA</publisher-name>
<fpage>164</fpage>
<lpage>172</lpage>
<comment>[<ext-link ext-link-type="uri" xlink:href="http://www.aclweb.org/anthology/W11-1826">http://www.aclweb.org/anthology/W11-1826</ext-link>
]</comment>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000328 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000328 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    TelematiV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3680070
   |texte=   A detailed error analysis of 13 kernel methods for protein–protein interaction extraction
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:23323857" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a TelematiV1

This area was generated with Dilib version V0.6.31.
Data generation: Thu Nov 2 16:09:04 2017. Site generation: Sun Mar 10 16:42:28 2024

	Serveur d'exploration sur la télématique
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la télématique

A detailed error analysis of 13 kernel methods for protein–protein interaction extraction

A detailed error analysis of 13 kernel methods for protein–protein interaction extraction

Source :

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki