Serveur sur les données et bibliothèques médicales au Maghreb (version finale)

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Identification of binding sites and favorable ligand binding moieties by virtual screening and self-organizing map analysis

Identifieur interne : 000029 ( Pmc/Corpus ); précédent : 000028; suivant : 000030

Identification of binding sites and favorable ligand binding moieties by virtual screening and self-organizing map analysis

Auteurs : Emna Harigua-Souiai ; Isidro Cortes-Ciriano ; Nathan Desdouits ; Thérèse E. Malliavin ; Ikram Guizani ; Michael Nilges ; Arnaud Blondel ; Guillaume Bouvier

Source :

RBID : PMC:4381396

Abstract

Background

Identifying druggable cavities on a protein surface is a crucial step in structure based drug design. The cavities have to present suitable size and shape, as well as appropriate chemical complementarity with ligands.

Results

We present a novel cavity prediction method that analyzes results of virtual screening of specific ligands or fragment libraries by means of Self-Organizing Maps. We demonstrate the method with two thoroughly studied proteins where it successfully identified their active sites (AS) and relevant secondary binding sites (BS). Moreover, known active ligands mapped the AS better than inactive ones. Interestingly, docking a naive fragment library brought even more insight. We then systematically applied the method to the 102 targets from the DUD-E database, where it showed a 90% identification rate of the AS among the first three consensual clusters of the SOM, and in 82% of the cases as the first one. Further analysis by chemical decomposition of the fragments improved BS prediction. Chemical substructures that are representative of the active ligands preferentially mapped in the AS.

Conclusion

The new approach provides valuable information both on relevant BSs and on chemical features promoting bioactivity.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0518-z) contains supplementary material, which is available to authorized users.


Url:
DOI: 10.1186/s12859-015-0518-z
PubMed: 25888251
PubMed Central: 4381396

Links to Exploration step

PMC:4381396

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Identification of binding sites and favorable ligand binding moieties by virtual screening and self-organizing map analysis</title>
<author>
<name sortKey="Harigua Souiai, Emna" sort="Harigua Souiai, Emna" uniqKey="Harigua Souiai E" first="Emna" last="Harigua-Souiai">Emna Harigua-Souiai</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2298 7385</institution-id>
<institution-id institution-id-type="GRID">grid.418517.e</institution-id>
<institution>Laboratory of Molecular Epidemiology and Experimental Pathology – LR11IPT04,</institution>
<institution>Institut Pasteur de Tunis, Université Tunis el Manar – Tunisia,</institution>
</institution-wrap>
13, Place Pasteur, Tunis, 1002 Tunisia</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff3">University of Carthage, Faculty of sciences of Bizerte – Tunisia, Jarzouna, 7021 Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cortes Ciriano, Isidro" sort="Cortes Ciriano, Isidro" uniqKey="Cortes Ciriano I" first="Isidro" last="Cortes-Ciriano">Isidro Cortes-Ciriano</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Desdouits, Nathan" sort="Desdouits, Nathan" uniqKey="Desdouits N" first="Nathan" last="Desdouits">Nathan Desdouits</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Malliavin, Therese E" sort="Malliavin, Therese E" uniqKey="Malliavin T" first="Thérèse E" last="Malliavin">Thérèse E. Malliavin</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Guizani, Ikram" sort="Guizani, Ikram" uniqKey="Guizani I" first="Ikram" last="Guizani">Ikram Guizani</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2298 7385</institution-id>
<institution-id institution-id-type="GRID">grid.418517.e</institution-id>
<institution>Laboratory of Molecular Epidemiology and Experimental Pathology – LR11IPT04,</institution>
<institution>Institut Pasteur de Tunis, Université Tunis el Manar – Tunisia,</institution>
</institution-wrap>
13, Place Pasteur, Tunis, 1002 Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nilges, Michael" sort="Nilges, Michael" uniqKey="Nilges M" first="Michael" last="Nilges">Michael Nilges</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Blondel, Arnaud" sort="Blondel, Arnaud" uniqKey="Blondel A" first="Arnaud" last="Blondel">Arnaud Blondel</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bouvier, Guillaume" sort="Bouvier, Guillaume" uniqKey="Bouvier G" first="Guillaume" last="Bouvier">Guillaume Bouvier</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">25888251</idno>
<idno type="pmc">4381396</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4381396</idno>
<idno type="RBID">PMC:4381396</idno>
<idno type="doi">10.1186/s12859-015-0518-z</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000029</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000029</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Identification of binding sites and favorable ligand binding moieties by virtual screening and self-organizing map analysis</title>
<author>
<name sortKey="Harigua Souiai, Emna" sort="Harigua Souiai, Emna" uniqKey="Harigua Souiai E" first="Emna" last="Harigua-Souiai">Emna Harigua-Souiai</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2298 7385</institution-id>
<institution-id institution-id-type="GRID">grid.418517.e</institution-id>
<institution>Laboratory of Molecular Epidemiology and Experimental Pathology – LR11IPT04,</institution>
<institution>Institut Pasteur de Tunis, Université Tunis el Manar – Tunisia,</institution>
</institution-wrap>
13, Place Pasteur, Tunis, 1002 Tunisia</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff3">University of Carthage, Faculty of sciences of Bizerte – Tunisia, Jarzouna, 7021 Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cortes Ciriano, Isidro" sort="Cortes Ciriano, Isidro" uniqKey="Cortes Ciriano I" first="Isidro" last="Cortes-Ciriano">Isidro Cortes-Ciriano</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Desdouits, Nathan" sort="Desdouits, Nathan" uniqKey="Desdouits N" first="Nathan" last="Desdouits">Nathan Desdouits</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Malliavin, Therese E" sort="Malliavin, Therese E" uniqKey="Malliavin T" first="Thérèse E" last="Malliavin">Thérèse E. Malliavin</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Guizani, Ikram" sort="Guizani, Ikram" uniqKey="Guizani I" first="Ikram" last="Guizani">Ikram Guizani</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2298 7385</institution-id>
<institution-id institution-id-type="GRID">grid.418517.e</institution-id>
<institution>Laboratory of Molecular Epidemiology and Experimental Pathology – LR11IPT04,</institution>
<institution>Institut Pasteur de Tunis, Université Tunis el Manar – Tunisia,</institution>
</institution-wrap>
13, Place Pasteur, Tunis, 1002 Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nilges, Michael" sort="Nilges, Michael" uniqKey="Nilges M" first="Michael" last="Nilges">Michael Nilges</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Blondel, Arnaud" sort="Blondel, Arnaud" uniqKey="Blondel A" first="Arnaud" last="Blondel">Arnaud Blondel</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bouvier, Guillaume" sort="Bouvier, Guillaume" uniqKey="Bouvier G" first="Guillaume" last="Bouvier">Guillaume Bouvier</name>
<affiliation>
<nlm:aff id="Aff1">Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Identifying druggable cavities on a protein surface is a crucial step in structure based drug design. The cavities have to present suitable size and shape, as well as appropriate chemical complementarity with ligands.</p>
</sec>
<sec>
<title>Results</title>
<p>We present a novel cavity prediction method that analyzes results of virtual screening of specific ligands or fragment libraries by means of Self-Organizing Maps. We demonstrate the method with two thoroughly studied proteins where it successfully identified their active sites (AS) and relevant secondary binding sites (BS). Moreover, known active ligands mapped the AS better than inactive ones. Interestingly, docking a naive fragment library brought even more insight. We then systematically applied the method to the 102 targets from the DUD-E database, where it showed a 90% identification rate of the AS among the first three consensual clusters of the SOM, and in 82% of the cases as the first one. Further analysis by chemical decomposition of the fragments improved BS prediction. Chemical substructures that are representative of the active ligands preferentially mapped in the AS.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>The new approach provides valuable information both on relevant BSs and on chemical features promoting bioactivity.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12859-015-0518-z) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Liang, J" uniqKey="Liang J">J Liang</name>
</author>
<author>
<name sortKey="Woodward, C" uniqKey="Woodward C">C Woodward</name>
</author>
<author>
<name sortKey="Edelsbrunner, H" uniqKey="Edelsbrunner H">H Edelsbrunner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="An, J" uniqKey="An J">J An</name>
</author>
<author>
<name sortKey="Totrov, M" uniqKey="Totrov M">M Totrov</name>
</author>
<author>
<name sortKey="Abagyan, R" uniqKey="Abagyan R">R Abagyan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Soga, S" uniqKey="Soga S">S Soga</name>
</author>
<author>
<name sortKey="Shirai, H" uniqKey="Shirai H">H Shirai</name>
</author>
<author>
<name sortKey="Kobori, M" uniqKey="Kobori M">M Kobori</name>
</author>
<author>
<name sortKey="Hirayama, N" uniqKey="Hirayama N">N Hirayama</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cheng, Ac" uniqKey="Cheng A">AC Cheng</name>
</author>
<author>
<name sortKey="Coleman, Rg" uniqKey="Coleman R">RG Coleman</name>
</author>
<author>
<name sortKey="Smyth, Kt" uniqKey="Smyth K">KT Smyth</name>
</author>
<author>
<name sortKey="Cao, Q" uniqKey="Cao Q">Q Cao</name>
</author>
<author>
<name sortKey="Soulard, P" uniqKey="Soulard P">P Soulard</name>
</author>
<author>
<name sortKey="Caffrey, Dr" uniqKey="Caffrey D">DR Caffrey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Halgren, Ta" uniqKey="Halgren T">TA Halgren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="L Pez, G" uniqKey="L Pez G">G López</name>
</author>
<author>
<name sortKey="Valencia, A" uniqKey="Valencia A">A Valencia</name>
</author>
<author>
<name sortKey="Tress, Ml" uniqKey="Tress M">ML Tress</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Capra, Ja" uniqKey="Capra J">JA Capra</name>
</author>
<author>
<name sortKey="Singh, M" uniqKey="Singh M">M Singh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Capra, Ja" uniqKey="Capra J">JA Capra</name>
</author>
<author>
<name sortKey="Laskowski, Ra" uniqKey="Laskowski R">RA Laskowski</name>
</author>
<author>
<name sortKey="Thornton, Jm" uniqKey="Thornton J">JM Thornton</name>
</author>
<author>
<name sortKey="Singh, M" uniqKey="Singh M">M Singh</name>
</author>
<author>
<name sortKey="Funkhouser, Ta" uniqKey="Funkhouser T">TA Funkhouser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mayrose, I" uniqKey="Mayrose I">I Mayrose</name>
</author>
<author>
<name sortKey="Graur, D" uniqKey="Graur D">D Graur</name>
</author>
<author>
<name sortKey="Ben Tal, N" uniqKey="Ben Tal N">N Ben-Tal</name>
</author>
<author>
<name sortKey="Pupko, T" uniqKey="Pupko T">T Pupko</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghersi, D" uniqKey="Ghersi D">D Ghersi</name>
</author>
<author>
<name sortKey="Sanchez, R" uniqKey="Sanchez R">R Sanchez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Levitt, Dg" uniqKey="Levitt D">DG Levitt</name>
</author>
<author>
<name sortKey="Banaszak, Lj" uniqKey="Banaszak L">LJ Banaszak</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Laskowski, Ra" uniqKey="Laskowski R">RA Laskowski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hendlich, M" uniqKey="Hendlich M">M Hendlich</name>
</author>
<author>
<name sortKey="Rippmann, F" uniqKey="Rippmann F">F Rippmann</name>
</author>
<author>
<name sortKey="Barnickel, G" uniqKey="Barnickel G">G Barnickel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dundas, J" uniqKey="Dundas J">J Dundas</name>
</author>
<author>
<name sortKey="Ouyang, Z" uniqKey="Ouyang Z">Z Ouyang</name>
</author>
<author>
<name sortKey="Tseng, J" uniqKey="Tseng J">J Tseng</name>
</author>
<author>
<name sortKey="Binkowski, A" uniqKey="Binkowski A">A Binkowski</name>
</author>
<author>
<name sortKey="Turpaz, Y" uniqKey="Turpaz Y">Y Turpaz</name>
</author>
<author>
<name sortKey="Liang, J" uniqKey="Liang J">J Liang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kawabata, T" uniqKey="Kawabata T">T Kawabata</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goodford, Pj" uniqKey="Goodford P">PJ Goodford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ruppert, J" uniqKey="Ruppert J">J Ruppert</name>
</author>
<author>
<name sortKey="Welch, W" uniqKey="Welch W">W Welch</name>
</author>
<author>
<name sortKey="Jain, An" uniqKey="Jain A">AN Jain</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harris, R" uniqKey="Harris R">R Harris</name>
</author>
<author>
<name sortKey="Olson, Aj" uniqKey="Olson A">AJ Olson</name>
</author>
<author>
<name sortKey="Goodsell, Ds" uniqKey="Goodsell D">DS Goodsell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Laurie, At" uniqKey="Laurie A">AT Laurie</name>
</author>
<author>
<name sortKey="Jackson, Rm" uniqKey="Jackson R">RM Jackson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, W" uniqKey="Yu W">W Yu</name>
</author>
<author>
<name sortKey="Lakkaraju, S" uniqKey="Lakkaraju S">S Lakkaraju</name>
</author>
<author>
<name sortKey="Raman, Ep" uniqKey="Raman E">EP Raman</name>
</author>
<author>
<name sortKey="Mackerell J, Alexanderd" uniqKey="Mackerell J A">AlexanderD MacKerell J</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brenke, R" uniqKey="Brenke R">R Brenke</name>
</author>
<author>
<name sortKey="Kozakov, D" uniqKey="Kozakov D">D Kozakov</name>
</author>
<author>
<name sortKey="Chuang, G Y" uniqKey="Chuang G">G-Y Chuang</name>
</author>
<author>
<name sortKey="Beglov, D" uniqKey="Beglov D">D Beglov</name>
</author>
<author>
<name sortKey="Hall, D" uniqKey="Hall D">D Hall</name>
</author>
<author>
<name sortKey="Landon, Mr" uniqKey="Landon M">MR Landon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ngan, C H" uniqKey="Ngan C">C-H Ngan</name>
</author>
<author>
<name sortKey="Hall, Dr" uniqKey="Hall D">DR Hall</name>
</author>
<author>
<name sortKey="Zerbe, B" uniqKey="Zerbe B">B Zerbe</name>
</author>
<author>
<name sortKey="Grove, Le" uniqKey="Grove L">LE Grove</name>
</author>
<author>
<name sortKey="Kozakov, D" uniqKey="Kozakov D">D Kozakov</name>
</author>
<author>
<name sortKey="Vajda, S" uniqKey="Vajda S">S Vajda</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, B" uniqKey="Huang B">B Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bowman, Al" uniqKey="Bowman A">AL Bowman</name>
</author>
<author>
<name sortKey="Lerner, Mg" uniqKey="Lerner M">MG Lerner</name>
</author>
<author>
<name sortKey="Carlson, Ha" uniqKey="Carlson H">HA Carlson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meagher, Kl" uniqKey="Meagher K">KL Meagher</name>
</author>
<author>
<name sortKey="Lerner, Mg" uniqKey="Lerner M">MG Lerner</name>
</author>
<author>
<name sortKey="Carlson, Ha" uniqKey="Carlson H">HA Carlson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Glinca, S" uniqKey="Glinca S">S Glinca</name>
</author>
<author>
<name sortKey="Klebe, G" uniqKey="Klebe G">G Klebe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghersi, D" uniqKey="Ghersi D">D Ghersi</name>
</author>
<author>
<name sortKey="Sanchez, R" uniqKey="Sanchez R">R Sanchez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Morris, Gm" uniqKey="Morris G">GM Morris</name>
</author>
<author>
<name sortKey="Huey, R" uniqKey="Huey R">R Huey</name>
</author>
<author>
<name sortKey="Lindstrom, W" uniqKey="Lindstrom W">W Lindstrom</name>
</author>
<author>
<name sortKey="Sanner, Mf" uniqKey="Sanner M">MF Sanner</name>
</author>
<author>
<name sortKey="Belew, Rk" uniqKey="Belew R">RK Belew</name>
</author>
<author>
<name sortKey="Goodsell, Ds" uniqKey="Goodsell D">DS Goodsell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kohonen, T" uniqKey="Kohonen T">T Kohonen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mahony, S" uniqKey="Mahony S">S Mahony</name>
</author>
<author>
<name sortKey="Hendrix, D" uniqKey="Hendrix D">D Hendrix</name>
</author>
<author>
<name sortKey="Golden, A" uniqKey="Golden A">A Golden</name>
</author>
<author>
<name sortKey="Smith, Tj" uniqKey="Smith T">TJ Smith</name>
</author>
<author>
<name sortKey="Rokhsar, Ds" uniqKey="Rokhsar D">DS Rokhsar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mahony, S" uniqKey="Mahony S">S Mahony</name>
</author>
<author>
<name sortKey="Benos, Pv" uniqKey="Benos P">PV Benos</name>
</author>
<author>
<name sortKey="Smith, Tj" uniqKey="Smith T">TJ Smith</name>
</author>
<author>
<name sortKey="Golden, A" uniqKey="Golden A">A Golden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hasegawa, K" uniqKey="Hasegawa K">K Hasegawa</name>
</author>
<author>
<name sortKey="Funatsu, K" uniqKey="Funatsu K">K Funatsu</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roche, O" uniqKey="Roche O">O Roche</name>
</author>
<author>
<name sortKey="Trube, G" uniqKey="Trube G">G Trube</name>
</author>
<author>
<name sortKey="Zuegge, J" uniqKey="Zuegge J">J Zuegge</name>
</author>
<author>
<name sortKey="Pflimlin, P" uniqKey="Pflimlin P">P Pflimlin</name>
</author>
<author>
<name sortKey="Alanine, A" uniqKey="Alanine A">A Alanine</name>
</author>
<author>
<name sortKey="Schneider, G" uniqKey="Schneider G">G Schneider</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bouvier, G" uniqKey="Bouvier G">G Bouvier</name>
</author>
<author>
<name sortKey="Evrard Todeschi, N" uniqKey="Evrard Todeschi N">N Evrard-Todeschi</name>
</author>
<author>
<name sortKey="Girault, J P" uniqKey="Girault J">J-P Girault</name>
</author>
<author>
<name sortKey="Bertho, G" uniqKey="Bertho G">G Bertho</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reker, D" uniqKey="Reker D">D Reker</name>
</author>
<author>
<name sortKey="Rodrigues, T" uniqKey="Rodrigues T">T Rodrigues</name>
</author>
<author>
<name sortKey="Schneider, P" uniqKey="Schneider P">P Schneider</name>
</author>
<author>
<name sortKey="Schneider, G" uniqKey="Schneider G">G Schneider</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Digles, D" uniqKey="Digles D">D Digles</name>
</author>
<author>
<name sortKey="Ecker, Gf" uniqKey="Ecker G">GF Ecker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bouvier, G" uniqKey="Bouvier G">G Bouvier</name>
</author>
<author>
<name sortKey="Duclert Savatier, N" uniqKey="Duclert Savatier N">N Duclert-Savatier</name>
</author>
<author>
<name sortKey="Desdouits, N" uniqKey="Desdouits N">N Desdouits</name>
</author>
<author>
<name sortKey="Meziane Cherif, D" uniqKey="Meziane Cherif D">D Meziane-Cherif</name>
</author>
<author>
<name sortKey="Blondel, A" uniqKey="Blondel A">A Blondel</name>
</author>
<author>
<name sortKey="Courvalin, P" uniqKey="Courvalin P">P Courvalin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miri, L" uniqKey="Miri L">L Miri</name>
</author>
<author>
<name sortKey="Bouvier, G" uniqKey="Bouvier G">G Bouvier</name>
</author>
<author>
<name sortKey="Kettani, A" uniqKey="Kettani A">A Kettani</name>
</author>
<author>
<name sortKey="Mikou, A" uniqKey="Mikou A">A Mikou</name>
</author>
<author>
<name sortKey="Wakrim, L" uniqKey="Wakrim L">L Wakrim</name>
</author>
<author>
<name sortKey="Nilges, M" uniqKey="Nilges M">M Nilges</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nivaskumar, M" uniqKey="Nivaskumar M">M Nivaskumar</name>
</author>
<author>
<name sortKey="Bouvier, G" uniqKey="Bouvier G">G Bouvier</name>
</author>
<author>
<name sortKey="Campos, M" uniqKey="Campos M">M Campos</name>
</author>
<author>
<name sortKey="Nadeau, N" uniqKey="Nadeau N">N Nadeau</name>
</author>
<author>
<name sortKey="Yu, X" uniqKey="Yu X">X Yu</name>
</author>
<author>
<name sortKey="Egelman, Eh" uniqKey="Egelman E">EH Egelman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Spill, Yg" uniqKey="Spill Y">YG Spill</name>
</author>
<author>
<name sortKey="Bouvier, G" uniqKey="Bouvier G">G Bouvier</name>
</author>
<author>
<name sortKey="Nilges, M" uniqKey="Nilges M">M Nilges</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mysinger, Mm" uniqKey="Mysinger M">MM Mysinger</name>
</author>
<author>
<name sortKey="Carchia, M" uniqKey="Carchia M">M Carchia</name>
</author>
<author>
<name sortKey="Irwin, Jj" uniqKey="Irwin J">JJ Irwin</name>
</author>
<author>
<name sortKey="Shoichet, Bk" uniqKey="Shoichet B">BK Shoichet</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bursulaya, Bd" uniqKey="Bursulaya B">BD Bursulaya</name>
</author>
<author>
<name sortKey="Totrov, M" uniqKey="Totrov M">M Totrov</name>
</author>
<author>
<name sortKey="Abagyan, R" uniqKey="Abagyan R">R Abagyan</name>
</author>
<author>
<name sortKey="Brooks Iii, Cl" uniqKey="Brooks Iii C">CL Brooks Iii</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sousa, Sf" uniqKey="Sousa S">SF Sousa</name>
</author>
<author>
<name sortKey="Fernandes, Pa" uniqKey="Fernandes P">PA Fernandes</name>
</author>
<author>
<name sortKey="Ramos, Mj" uniqKey="Ramos M">MJ Ramos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Warren, Gl" uniqKey="Warren G">GL Warren</name>
</author>
<author>
<name sortKey="Andrews, Cw" uniqKey="Andrews C">CW Andrews</name>
</author>
<author>
<name sortKey="Capelli, A M" uniqKey="Capelli A">A-M Capelli</name>
</author>
<author>
<name sortKey="Clarke, B" uniqKey="Clarke B">B Clarke</name>
</author>
<author>
<name sortKey="Lalonde, J" uniqKey="Lalonde J">J LaLonde</name>
</author>
<author>
<name sortKey="Lambert, Mh" uniqKey="Lambert M">MH Lambert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moitessier, N" uniqKey="Moitessier N">N Moitessier</name>
</author>
<author>
<name sortKey="Englebienne, P" uniqKey="Englebienne P">P Englebienne</name>
</author>
<author>
<name sortKey="Lee, D" uniqKey="Lee D">D Lee</name>
</author>
<author>
<name sortKey="Lawandi, J" uniqKey="Lawandi J">J Lawandi</name>
</author>
<author>
<name sortKey="Corbeil, Cr" uniqKey="Corbeil C">CR Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Plewczynski, D" uniqKey="Plewczynski D">D Plewczynski</name>
</author>
<author>
<name sortKey="La Niewski, M" uniqKey="La Niewski M">M Łaźniewski</name>
</author>
<author>
<name sortKey="Augustyniak, R" uniqKey="Augustyniak R">R Augustyniak</name>
</author>
<author>
<name sortKey="Ginalski, K" uniqKey="Ginalski K">K Ginalski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ewing, Tj" uniqKey="Ewing T">TJ Ewing</name>
</author>
<author>
<name sortKey="Makino, S" uniqKey="Makino S">S Makino</name>
</author>
<author>
<name sortKey="Skillman, Ag" uniqKey="Skillman A">AG Skillman</name>
</author>
<author>
<name sortKey="Kuntz, Id" uniqKey="Kuntz I">ID Kuntz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Trott, O" uniqKey="Trott O">O Trott</name>
</author>
<author>
<name sortKey="Olson, Aj" uniqKey="Olson A">AJ Olson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Glem, Rc" uniqKey="Glem R">RC Glem</name>
</author>
<author>
<name sortKey="Bender, A" uniqKey="Bender A">A Bender</name>
</author>
<author>
<name sortKey="Arnby, Ch" uniqKey="Arnby C">CH Arnby</name>
</author>
<author>
<name sortKey="Carlsson, L" uniqKey="Carlsson L">L Carlsson</name>
</author>
<author>
<name sortKey="Boyer, S" uniqKey="Boyer S">S Boyer</name>
</author>
<author>
<name sortKey="Smith, J" uniqKey="Smith J">J Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rogers, D" uniqKey="Rogers D">D Rogers</name>
</author>
<author>
<name sortKey="Hahn, M" uniqKey="Hahn M">M Hahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bender, A" uniqKey="Bender A">A Bender</name>
</author>
<author>
<name sortKey="Jenkins, Jl" uniqKey="Jenkins J">JL Jenkins</name>
</author>
<author>
<name sortKey="Scheiber, J" uniqKey="Scheiber J">J Scheiber</name>
</author>
<author>
<name sortKey="Sukuru, Sck" uniqKey="Sukuru S">SCK Sukuru</name>
</author>
<author>
<name sortKey="Glick, M" uniqKey="Glick M">M Glick</name>
</author>
<author>
<name sortKey="Davies, Jw" uniqKey="Davies J">JW Davies</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Westen, Gjp" uniqKey="Van Westen G">GJP van Westen</name>
</author>
<author>
<name sortKey="Van Den Hoven, Oo" uniqKey="Van Den Hoven O">OO van den Hoven</name>
</author>
<author>
<name sortKey="Van Der Pijl, R" uniqKey="Van Der Pijl R">R van der Pijl</name>
</author>
<author>
<name sortKey="Mulder Krieger, T" uniqKey="Mulder Krieger T">T Mulder-Krieger</name>
</author>
<author>
<name sortKey="De Vries, H" uniqKey="De Vries H">H de Vries</name>
</author>
<author>
<name sortKey="Wegner, Jk" uniqKey="Wegner J">JK Wegner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cortes Ciriano, I" uniqKey="Cortes Ciriano I">I Cortes-Ciriano</name>
</author>
<author>
<name sortKey="Van Westen, Gj" uniqKey="Van Westen G">GJ van Westen</name>
</author>
<author>
<name sortKey="Lenselink, Eb" uniqKey="Lenselink E">EB Lenselink</name>
</author>
<author>
<name sortKey="Murrell, Ds" uniqKey="Murrell D">DS Murrell</name>
</author>
<author>
<name sortKey="Bender, A" uniqKey="Bender A">A Bender</name>
</author>
<author>
<name sortKey="Malliavin, T" uniqKey="Malliavin T">T Malliavin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, N" uniqKey="Huang N">N Huang</name>
</author>
<author>
<name sortKey="Shoichet, Bk" uniqKey="Shoichet B">BK Shoichet</name>
</author>
<author>
<name sortKey="Irwin, Jj" uniqKey="Irwin J">JJ Irwin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sarafianos, Sg" uniqKey="Sarafianos S">SG Sarafianos</name>
</author>
<author>
<name sortKey="Marchand, B" uniqKey="Marchand B">B Marchand</name>
</author>
<author>
<name sortKey="Das, K" uniqKey="Das K">K Das</name>
</author>
<author>
<name sortKey="Himmel, Dm" uniqKey="Himmel D">DM Himmel</name>
</author>
<author>
<name sortKey="Parniak, Ma" uniqKey="Parniak M">MA Parniak</name>
</author>
<author>
<name sortKey="Hughes, Sh" uniqKey="Hughes S">SH Hughes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mitchell, Ml" uniqKey="Mitchell M">ML Mitchell</name>
</author>
<author>
<name sortKey="Son, Jc" uniqKey="Son J">JC Son</name>
</author>
<author>
<name sortKey="Lee, Iy" uniqKey="Lee I">IY Lee</name>
</author>
<author>
<name sortKey="Lee, C K" uniqKey="Lee C">C-K Lee</name>
</author>
<author>
<name sortKey="Kim, Hs" uniqKey="Kim H">HS Kim</name>
</author>
<author>
<name sortKey="Guo, H" uniqKey="Guo H">H Guo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cowan Jacob, Sw" uniqKey="Cowan Jacob S">SW Cowan-Jacob</name>
</author>
<author>
<name sortKey="Fendrich, G" uniqKey="Fendrich G">G Fendrich</name>
</author>
<author>
<name sortKey="Floersheimer, A" uniqKey="Floersheimer A">A Floersheimer</name>
</author>
<author>
<name sortKey="Furet, P" uniqKey="Furet P">P Furet</name>
</author>
<author>
<name sortKey="Liebetanz, J" uniqKey="Liebetanz J">J Liebetanz</name>
</author>
<author>
<name sortKey="Rummel, G" uniqKey="Rummel G">G Rummel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Congreve, M" uniqKey="Congreve M">M Congreve</name>
</author>
<author>
<name sortKey="Carr, R" uniqKey="Carr R">R Carr</name>
</author>
<author>
<name sortKey="Murray, C" uniqKey="Murray C">C Murray</name>
</author>
<author>
<name sortKey="Jhoti, H" uniqKey="Jhoti H">H Jhoti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, B" uniqKey="Lee B">B Lee</name>
</author>
<author>
<name sortKey="Richards, Fm" uniqKey="Richards F">FM Richards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Desdouits, N" uniqKey="Desdouits N">N Desdouits</name>
</author>
<author>
<name sortKey="Nilges, M" uniqKey="Nilges M">M Nilges</name>
</author>
<author>
<name sortKey="Blondel, A" uniqKey="Blondel A">A Blondel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pettersen, Ef" uniqKey="Pettersen E">EF Pettersen</name>
</author>
<author>
<name sortKey="Goddard, Td" uniqKey="Goddard T">TD Goddard</name>
</author>
<author>
<name sortKey="Huang, Cc" uniqKey="Huang C">CC Huang</name>
</author>
<author>
<name sortKey="Couch, Gs" uniqKey="Couch G">GS Couch</name>
</author>
<author>
<name sortKey="Greenblatt, Dm" uniqKey="Greenblatt D">DM Greenblatt</name>
</author>
<author>
<name sortKey="Meng, Ec" uniqKey="Meng E">EC Meng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pedregosa, F" uniqKey="Pedregosa F">F Pedregosa</name>
</author>
<author>
<name sortKey="Varoquaux, G" uniqKey="Varoquaux G">G Varoquaux</name>
</author>
<author>
<name sortKey="Gramfort, A" uniqKey="Gramfort A">A Gramfort</name>
</author>
<author>
<name sortKey="Michel, V" uniqKey="Michel V">V Michel</name>
</author>
<author>
<name sortKey="Thirion, B" uniqKey="Thirion B">B Thirion</name>
</author>
<author>
<name sortKey="Grisel, O" uniqKey="Grisel O">O Grisel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwarz, G" uniqKey="Schwarz G">G Schwarz</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bauman, Jd" uniqKey="Bauman J">JD Bauman</name>
</author>
<author>
<name sortKey="Patel, D" uniqKey="Patel D">D Patel</name>
</author>
<author>
<name sortKey="Dharia, C" uniqKey="Dharia C">C Dharia</name>
</author>
<author>
<name sortKey="Fromer, Mw" uniqKey="Fromer M">MW Fromer</name>
</author>
<author>
<name sortKey="Ahmed, S" uniqKey="Ahmed S">S Ahmed</name>
</author>
<author>
<name sortKey="Frenkel, Y" uniqKey="Frenkel Y">Y Frenkel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schindler, T" uniqKey="Schindler T">T Schindler</name>
</author>
<author>
<name sortKey="Bornmann, W" uniqKey="Bornmann W">W Bornmann</name>
</author>
<author>
<name sortKey="Pellicena, P" uniqKey="Pellicena P">P Pellicena</name>
</author>
<author>
<name sortKey="Miller, Wt" uniqKey="Miller W">WT Miller</name>
</author>
<author>
<name sortKey="Clarkson, B" uniqKey="Clarkson B">B Clarkson</name>
</author>
<author>
<name sortKey="Kuriyan, J" uniqKey="Kuriyan J">J Kuriyan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dennis, S" uniqKey="Dennis S">S Dennis</name>
</author>
<author>
<name sortKey="Kortvelyesi, T" uniqKey="Kortvelyesi T">T Kortvelyesi</name>
</author>
<author>
<name sortKey="Vajda, S" uniqKey="Vajda S">S Vajda</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kortvelyesi, T" uniqKey="Kortvelyesi T">T Kortvelyesi</name>
</author>
<author>
<name sortKey="Silberstein, M" uniqKey="Silberstein M">M Silberstein</name>
</author>
<author>
<name sortKey="Dennis, S" uniqKey="Dennis S">S Dennis</name>
</author>
<author>
<name sortKey="Vajda, S" uniqKey="Vajda S">S Vajda</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, Ln" uniqKey="Johnson L">LN Johnson</name>
</author>
<author>
<name sortKey="Noble, Me" uniqKey="Noble M">ME Noble</name>
</author>
<author>
<name sortKey="Owen, Dj" uniqKey="Owen D">DJ Owen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Morris, Gm" uniqKey="Morris G">GM Morris</name>
</author>
<author>
<name sortKey="Goodsell, Ds" uniqKey="Goodsell D">DS Goodsell</name>
</author>
<author>
<name sortKey="Halliday, Rs" uniqKey="Halliday R">RS Halliday</name>
</author>
<author>
<name sortKey="Huey, R" uniqKey="Huey R">R Huey</name>
</author>
<author>
<name sortKey="Hart, We" uniqKey="Hart W">WE Hart</name>
</author>
<author>
<name sortKey="Belew, Rk" uniqKey="Belew R">RK Belew</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kuntz, Id" uniqKey="Kuntz I">ID Kuntz</name>
</author>
<author>
<name sortKey="Blaney, Jm" uniqKey="Blaney J">JM Blaney</name>
</author>
<author>
<name sortKey="Oatley, Sj" uniqKey="Oatley S">SJ Oatley</name>
</author>
<author>
<name sortKey="Langridge, R" uniqKey="Langridge R">R Langridge</name>
</author>
<author>
<name sortKey="Ferrin, Te" uniqKey="Ferrin T">TE Ferrin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Laskowski, Ra" uniqKey="Laskowski R">RA Laskowski</name>
</author>
<author>
<name sortKey="Luscombe, Nm" uniqKey="Luscombe N">NM Luscombe</name>
</author>
<author>
<name sortKey="Swindells, Mb" uniqKey="Swindells M">MB Swindells</name>
</author>
<author>
<name sortKey="Thornton, Jm" uniqKey="Thornton J">JM Thornton</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">25888251</article-id>
<article-id pub-id-type="pmc">4381396</article-id>
<article-id pub-id-type="publisher-id">518</article-id>
<article-id pub-id-type="doi">10.1186/s12859-015-0518-z</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methodology Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Identification of binding sites and favorable ligand binding moieties by virtual screening and self-organizing map analysis</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Harigua-Souiai</surname>
<given-names>Emna</given-names>
</name>
<address>
<email>emna.harigua@pasteur.fr</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cortes-Ciriano</surname>
<given-names>Isidro</given-names>
</name>
<address>
<email>isidro.cortes@pasteur.fr</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Desdouits</surname>
<given-names>Nathan</given-names>
</name>
<address>
<email>nathan@pasteur.fr</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Malliavin</surname>
<given-names>Thérèse E</given-names>
</name>
<address>
<email>terez@pasteur.fr</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Guizani</surname>
<given-names>Ikram</given-names>
</name>
<address>
<email>ikram.guizani@pasteur.rns.tn</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Nilges</surname>
<given-names>Michael</given-names>
</name>
<address>
<email>nilges@pasteur.fr</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Blondel</surname>
<given-names>Arnaud</given-names>
</name>
<address>
<email>ablondel@pasteur.fr</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Bouvier</surname>
<given-names>Guillaume</given-names>
</name>
<address>
<email>guillaume.bouvier@pasteur.fr</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015 France</aff>
<aff id="Aff2">
<label>2</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2298 7385</institution-id>
<institution-id institution-id-type="GRID">grid.418517.e</institution-id>
<institution>Laboratory of Molecular Epidemiology and Experimental Pathology – LR11IPT04,</institution>
<institution>Institut Pasteur de Tunis, Université Tunis el Manar – Tunisia,</institution>
</institution-wrap>
13, Place Pasteur, Tunis, 1002 Tunisia</aff>
<aff id="Aff3">
<label>3</label>
University of Carthage, Faculty of sciences of Bizerte – Tunisia, Jarzouna, 7021 Tunisia</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>21</day>
<month>3</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>16</volume>
<elocation-id>93</elocation-id>
<history>
<date date-type="received">
<day>1</day>
<month>9</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>2</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>© Harigua-Souiai et al.; licensee BioMed Central. 2015</copyright-statement>
<license license-type="open-access">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0">http://creativecommons.org/licenses/by/4.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>Identifying druggable cavities on a protein surface is a crucial step in structure based drug design. The cavities have to present suitable size and shape, as well as appropriate chemical complementarity with ligands.</p>
</sec>
<sec>
<title>Results</title>
<p>We present a novel cavity prediction method that analyzes results of virtual screening of specific ligands or fragment libraries by means of Self-Organizing Maps. We demonstrate the method with two thoroughly studied proteins where it successfully identified their active sites (AS) and relevant secondary binding sites (BS). Moreover, known active ligands mapped the AS better than inactive ones. Interestingly, docking a naive fragment library brought even more insight. We then systematically applied the method to the 102 targets from the DUD-E database, where it showed a 90% identification rate of the AS among the first three consensual clusters of the SOM, and in 82% of the cases as the first one. Further analysis by chemical decomposition of the fragments improved BS prediction. Chemical substructures that are representative of the active ligands preferentially mapped in the AS.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>The new approach provides valuable information both on relevant BSs and on chemical features promoting bioactivity.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12859-015-0518-z) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Self-organizing maps</kwd>
<kwd>Binding site</kwd>
<kwd>Chemical fingerprints</kwd>
<kwd>Chemical fragments</kwd>
<kwd>Virtual screening</kwd>
<kwd>Probe-mapping</kwd>
<kwd>Docking</kwd>
</kwd-group>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2015</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Background</title>
<p>Identifying druggable cavities or pockets on a target protein is of high importance in the development of novel strategies in a structure-based drug discovery process. Binding sites (BSs), with or without ligand, are usually referred to as cavities at the protein surface and display a large variety of size and shape [
<xref ref-type="bibr" rid="CR1">1</xref>
,
<xref ref-type="bibr" rid="CR2">2</xref>
]. Consequently, in the context of drug discovery, refined criteria are necessary to discriminate potent binding pockets. The required properties, together referred to as “druggability”, are the subject of active research, and many scores have been elaborated to estimate them [
<xref ref-type="bibr" rid="CR3">3</xref>
-
<xref ref-type="bibr" rid="CR5">5</xref>
]. Protein-ligand interactions that promote binding appear to be mainly driven by cavity shape and size, as well as by chemical complementarity between the ligand and the protein atoms.</p>
<p>Existing methods and algorithms typically use evolutionary, geometrical, probe-mapping or energy-based principles for BS identification. Evolutionary methods [
<xref ref-type="bibr" rid="CR6">6</xref>
-
<xref ref-type="bibr" rid="CR8">8</xref>
] make use of structure and/or sequence alignments to identify BSs. They assume that conserved residues among one group of functionally related proteins would vary across different groups [
<xref ref-type="bibr" rid="CR9">9</xref>
] so they constitute an “evolutionary trace” of BSs. These approaches are limited by the fact that conserved features may not be correlated to protein activity but rather to stability or folding [
<xref ref-type="bibr" rid="CR10">10</xref>
]. Moreover, as a consequence of a low degree of sequence similarity or identity within a working dataset for a given protein query, the obtained results may be poor [
<xref ref-type="bibr" rid="CR10">10</xref>
]. Purely geometric methods [
<xref ref-type="bibr" rid="CR1">1</xref>
,
<xref ref-type="bibr" rid="CR11">11</xref>
-
<xref ref-type="bibr" rid="CR15">15</xref>
] have the advantage of being fast. They assume that a BS is a cavity or a cleft in the receptor surface and do not model the potency of a detected cavity to bind a drug-like molecule. Consequently, they can not distinguish different types of sites (e.g., hydrophobic versus polar). Finally, probe-mapping [
<xref ref-type="bibr" rid="CR16">16</xref>
-
<xref ref-type="bibr" rid="CR18">18</xref>
] and energy-based [
<xref ref-type="bibr" rid="CR2">2</xref>
,
<xref ref-type="bibr" rid="CR19">19</xref>
] methods are most of the time coupled (e.g., SILCS [
<xref ref-type="bibr" rid="CR20">20</xref>
]). They calculate the energy between a probe and the target on a grid and in this way map energetically favorable areas for binding. Probes can be atoms (aliphatic carbon, aromatic carbon, hydrogen, oxygen, nitrogen, sulfur, etc) [
<xref ref-type="bibr" rid="CR2">2</xref>
,
<xref ref-type="bibr" rid="CR18">18</xref>
] or functional groups (methyl, amine, hydroxyl, cetone groups, etc) [
<xref ref-type="bibr" rid="CR19">19</xref>
,
<xref ref-type="bibr" rid="CR21">21</xref>
,
<xref ref-type="bibr" rid="CR22">22</xref>
]. Many studies make simultaneous use of two types of approaches (geometric, energy-based, probe-mapping or evolutionary methods) or are coupled with other computational strategies. For instance, combining geometrical and energy based principles, through the “MetaPocket” server [
<xref ref-type="bibr" rid="CR23">23</xref>
], improved the accuracy of these methods. Bowman [
<xref ref-type="bibr" rid="CR24">24</xref>
] and Meagher [
<xref ref-type="bibr" rid="CR25">25</xref>
] used receptor flexibility to successfully identify pharmacophores, used as probes, that are highly present in known inhibitors of the targeted protein. In a recent work, Glinca and Klebe [
<xref ref-type="bibr" rid="CR26">26</xref>
] showed that the use of exposed physicochemical properties on cavities is more valuable than the use of sequence information, in the classification of protein families with respect to inhibitor selectivity. This stresses importance of considering protein-ligand interactions on the energetic level to assess a pocket’s “druggability”.</p>
<p>Probe-mapping and energy-based methods are the obvious way to model chemical complementarity between the ligand and the protein atoms. PocketFinder [
<xref ref-type="bibr" rid="CR2">2</xref>
], for example, assesses a van der Waals potential over a grid and identifies all pockets with a volume larger than 100Å
<sup>3</sup>
. In 80.9% of the cases, 50% of the ligand overlapped the largest pocket and 11.8% overlapped the second one. Q-siteFinder [
<xref ref-type="bibr" rid="CR19">19</xref>
] uses GRID [
<xref ref-type="bibr" rid="CR16">16</xref>
] to calculate a van der Waals potential of a methyl probe. Probes with favorable interaction energies are clustered. Clusters are then ranked according to their total interaction energies and the top 3 are considered as binding pockets. In 90% of the cases, 25% of the active ligand atoms were within 1.6Å of one of the top ranked pockets. An algorithm similar to Q-SiteFinder, called SiteHound [
<xref ref-type="bibr" rid="CR27">27</xref>
], uses AutoGrid from the AutoDock 4 suite [
<xref ref-type="bibr" rid="CR28">28</xref>
] for grid calculation and pocket prediction instead of GRID [
<xref ref-type="bibr" rid="CR16">16</xref>
]. Then, after docking a known ligand on 77 proteins, the authors found that in 95% of the cases, the ligand center falls within 10.0Å of at least one of the first three predicted sites. The SiteHound success rate varied between 80 and 84% when the criterion was set to 15% or more ligand heavy atoms within a radius of 2.0Å from one of the first three predicted sites. Another algorithm called FTSite [
<xref ref-type="bibr" rid="CR22">22</xref>
] performs a global search of the protein surface for regions that bind small organic probes by making use of a fast Fourrier transform approach [
<xref ref-type="bibr" rid="CR21">21</xref>
]. FTSite was tested on the test set used by the Q-siteFinder authors [
<xref ref-type="bibr" rid="CR19">19</xref>
] and performed at a success rate of 97% with same parameter values (precision = 25%, radius = 1.6Å) [
<xref ref-type="bibr" rid="CR22">22</xref>
].</p>
<p>The present work introduces a new concept for the identification of BSs. It directly uses docking calculations, in combination with an analysis of the results by Self-Organizing Maps (SOMs) [
<xref ref-type="bibr" rid="CR29">29</xref>
]. The SOM algorithm has many applications and can virtually be applied on any type of data. For example, SOMBRERO [
<xref ref-type="bibr" rid="CR30">30</xref>
,
<xref ref-type="bibr" rid="CR31">31</xref>
] is a SOM-based algorithm that detects transcription factor BSs on DNA sequences. A spherical SOM (SSOM) appeared useful in mapping a protein surface onto a sphere to better characterize its active site [
<xref ref-type="bibr" rid="CR32">32</xref>
]. SOMs have a wide range of uses in virtual screening analysis and hit selection [
<xref ref-type="bibr" rid="CR33">33</xref>
-
<xref ref-type="bibr" rid="CR35">35</xref>
]. A recent paper used SOM as a tool for identifying macromolecular targets of de-novo designed chemical entities [
<xref ref-type="bibr" rid="CR36">36</xref>
]. Digles [
<xref ref-type="bibr" rid="CR37">37</xref>
] presents an interesting review of this specific application of SOMs. We have recently demonstrated the usefulness of SOMs in the analysis of molecular dynamics trajectories and ligand docking poses [
<xref ref-type="bibr" rid="CR38">38</xref>
-
<xref ref-type="bibr" rid="CR41">41</xref>
].</p>
<p>In a first step, we calibrated our method on two challenging targets from the “Database of Useful Decoys - Enhanced” (DUD-E) [
<xref ref-type="bibr" rid="CR42">42</xref>
]. The DUD-E database provides dedicated ligand libraries for each protein target to benchmark docking approaches, one with the known effectors, and one with a series of decoy compounds. We also used an additional “generic” library, the Enamine Golden Fragments (EGF) collection (www.enamine.net). It contains a moderate number of entities, 1500 fragments, presenting a wide chemical diversity (see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S1). The use of this library permits us to cover a large chemical space with a limited computational effort. Blind dockings of these databases were performed with two free software packages that have been extensively used and evaluated by others [
<xref ref-type="bibr" rid="CR43">43</xref>
-
<xref ref-type="bibr" rid="CR47">47</xref>
]; Dock [
<xref ref-type="bibr" rid="CR48">48</xref>
] and AutoDock Vina (ADvina) [
<xref ref-type="bibr" rid="CR49">49</xref>
], and are based on different searching algorithms and scoring functions. We identified the combination of docking program and ligand library giving the best prediction rates in this analysis. Then, we used this combination to assess the method accuracy in BS identification on all targets in the DUD-E (102 proteins). Docking results were analyzed with an in-house version of the Self-Organizing Map (SOM) applied directly to the ligand atomic coordinates as descriptors. The resulting SOMs provide a simple and intuitive representation of the spatial distribution of the docking poses. BSs can be identified as zones of high docking pose density and homogeneity.</p>
<p>In addition, we tested whether the proposed approach could give some “a priori” information on the chemical nature of potential ligands. For that, we analyzed the chemical structure of the docked compounds from the naive fragment library with Morgan fingerprints, which provide a decomposition of the molecules into a set of “chemical features” [
<xref ref-type="bibr" rid="CR50">50</xref>
,
<xref ref-type="bibr" rid="CR51">51</xref>
]. Previous studies have shown the efficiency of circular fingerprints in drug discovery tasks, such as the search for ligand analogues or virtual screening [
<xref ref-type="bibr" rid="CR52">52</xref>
-
<xref ref-type="bibr" rid="CR54">54</xref>
]. With SOM, we analyzed how the geometrical centers of these chemical features are distributed in space upon docking. Interestingly, this analysis provided an even more accurate mapping of the BSs, thus enhancing the interpretability of the SOMs. Furthermore, the chemical features of the naive library that are also present in active ligands mapped preferentially in the active sites.</p>
</sec>
<sec id="Sec2">
<title>Methods</title>
<sec id="Sec3">
<title>Protein targets and ligand libraries</title>
<p>The DUD-E database [
<xref ref-type="bibr" rid="CR42">42</xref>
,
<xref ref-type="bibr" rid="CR55">55</xref>
] provides 102 targets ready for docking in pdb format. For each target, a definition of the active site (AS) is provided by means of the co-crystal 3D structure of the target with an active ligand. Prior to the assessment of the method’s accuracy in BS identification, we tuned the parameters of the approach to obtain the most accurate predictions on two specific targets.</p>
<p>The first target, the HIV-1 reverse-transcriptase, is a heterodimer with two structurally distinct subunits, p51 (429 AA) and p66 (553 AA) [
<xref ref-type="bibr" rid="CR56">56</xref>
]. The docking target site defined in the DUD-E is a sub-domain (272 AA) derived from the 3LAN PDB entry [
<xref ref-type="bibr" rid="CR57">57</xref>
]. It is composed of a part of p66 and a small portion of p51 and contains the active site, an allosteric site and many other pockets and cavities (Figure
<xref rid="Fig1" ref-type="fig">1</xref>
). The DUD-E provides 338 active molecules and 18880 decoys for HIV-RT.
<fig id="Fig1">
<label>Figure 1</label>
<caption>
<p>
<bold>Cavities detected with</bold>
<bold>
<italic>mkgrid</italic>
</bold>
<bold> and mapping of the docking outputs with SOMs for DUD-E active molecules docked with ADvina.</bold>
<bold>(a)</bold>
HIV-RT represented by ribbons and pink transparent surface, with cavities labeled (1,2,3,4,5,7). Cavities 6,8 and 9 are not visible on this 2D projection.
<bold>(b)</bold>
SOM representation of results. AS fits in the dark blue cavity (2) and BS2 in the big cyan cavity (3). Cavity (6), behind, is pointed out with an arrow.
<bold>(c)</bold>
ABL1 represented by ribbons and light blue transparent surface, with cavities labeled (1’,3’,4’,6’,7’ and 11’). The remaining cavities are not visible on this 2D projection.
<bold>(d)</bold>
SOM representation of results. AS fits in the dark blue cavity (1’) and BS2 is in cavity (6’).</p>
</caption>
<graphic xlink:href="12859_2015_518_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
<p>The second target is a sub-domain (221 AA) of the human tyrosine-protein kinase ABL1 (1130 AA) as defined in the DUD-E (PDB entry: 2HZI) [
<xref ref-type="bibr" rid="CR58">58</xref>
]. Similarly to the first target, this sub-domain contains the active site, a secondary BS and many other cavities (Figure
<xref rid="Fig1" ref-type="fig">1</xref>
). A library of 182 actives molecules and 10745 decoys is provided for ABL1.</p>
<p>In addition to the specific libraries of active and non-active molecules provided by the DUD-E, we docked the Enamine Golden Fragment (EGF) library, composed of 1500 fragments, (www.enamine.net). The EGF collection follows the “Rule of Three” [
<xref ref-type="bibr" rid="CR59">59</xref>
] with range of values in slightly tighter intervals. In practice, these ranges are: (i) molecular weight within [150,300] Da; (ii) clogP within [−2,3]; (iii) less than 3 Hbond acceptors; (iv) less than 3 Hbond donors; (v) less than 3 rotatable bonds and (vi) polar surface area inferior to 60Å
<sup>2</sup>
.</p>
</sec>
<sec id="Sec4">
<title>Cavity identification</title>
<p>An in-house software based on the Lee and Richards solvent accessible surface calculation algorithm [
<xref ref-type="bibr" rid="CR60">60</xref>
], called
<italic>mkgrid</italic>
[
<xref ref-type="bibr" rid="CR61">61</xref>
] was used to detect cavities embedded in both protein targets. The method discretizes space on a 0.5 Å grid and calculates the solvent accessible volume with a 1.4 Å radius probe sphere (also accessing interior cavities). Bulk solvent is defined with a 10 Å radius probe sphere. Cavities are defined as the volume accessible to the solvent, but not to bulk solvent. Remaining void grid points are clustered by connectivity and labeled according to their cluster number to identify individual cavities. Clusters having less than 96 points (12 Å
<sup>3</sup>
, about the volume of a water molecule) are discarded. The cavities were graphically inspected.</p>
</sec>
<sec id="Sec5">
<title>Docking & virtual screening</title>
<p>The Dock6.0 (Dock) [
<xref ref-type="bibr" rid="CR48">48</xref>
] and AutoDock Vina (ADvina) [
<xref ref-type="bibr" rid="CR49">49</xref>
] programs were used for docking. The clustering step during pruning of the anchor and grow incremental construction approach was disabled for dock. Otherwise the default parameters were used. For Dock, receptor files were prepared with Chimera (www.cgl.ucsf.edu/chimera) [
<xref ref-type="bibr" rid="CR62">62</xref>
]. Hydrogens were removed, Gasteiger charges calculated and molecular surfaces generated. We used the spheres, docking box and mol2 ligand files provided by the DUD-E. For ADvina, the required PDBQT files for the receptor and the ligands were generated from the original mol2 files with the Open Babel converter (openbabel.org). A maximum of 20 lowest-energy poses were kept for each ligand.</p>
</sec>
<sec id="Sec6">
<title>SOM</title>
<p>To analyze the docked ligand poses, we used an in-house implementation of the Self-Organizing Map (SOM) algorithm first introduced by Kohonen [
<xref ref-type="bibr" rid="CR29">29</xref>
]. We trained a 3D non-periodic map,
<italic>Ω</italic>
<sub>
<italic>ijk</italic>
</sub>
, with the
<italic>n</italic>
3D coordinate of all atoms of all retained docked ligand poses. To set up the SOM, the whole set of
<italic>n</italic>
atomic coordinates was analyzed by Principal Component Analysis (PCA). This yielded a set of three normalized principal components,
<italic>V</italic>
<sub>
<italic>i</italic>
=1,2,3</sub>
, with associated lengths
<italic>S</italic>
<sub>
<italic>i</italic>
=1,2,3</sub>
, the square roots of the eigenvalues. The dimensions of the SOM, I, J and K, were set to integer values approximately proportional to
<italic>S</italic>
<sub>1</sub>
,
<italic>S</italic>
<sub>2</sub>
,
<italic>S</italic>
<sub>3</sub>
, with a product I ×
<italic>J</italic>
×K close to 15
<sup>3</sup>
. These map dimensions are given in the legends of the Figures
<xref rid="Fig2" ref-type="fig">2</xref>
,
<xref rid="Fig3" ref-type="fig">3</xref>
,
<xref rid="Fig4" ref-type="fig">4</xref>
and
<xref rid="Fig5" ref-type="fig">5</xref>
displaying the SOM results. The maximum and minimum projection values over the
<italic>n</italic>
input vectors on
<italic>V</italic>
<sub>
<italic>i</italic>
=1,2,3</sub>
were calculated as
<italic>V</italic>
<italic>i</italic>
+ and
<italic>V</italic>
<italic>i</italic>
−. The SOM was initialized with triplets of real numbers regularly spaced along the three eigenvectors:
<inline-formula id="IEq1">
<alternatives>
<tex-math id="M1">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $\Omega _{\textit {ijk}} =\left (V_{1}^-+i. \left (V_{1}^+-V_{1}^-\right)/I\right)$ \end{document}</tex-math>
<mml:math id="M2">
<mml:msub>
<mml:mrow>
<mml:mi>Ω</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">ijk</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:mi>i.</mml:mi>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>+</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo></mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>/</mml:mo>
<mml:mi>I</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:math>
<inline-graphic xlink:href="12859_2015_518_IEq1.gif"></inline-graphic>
</alternatives>
</inline-formula>
.
<inline-formula id="IEq2">
<alternatives>
<tex-math id="M3">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $V_1+\left (V_{2}^-+j.\left (V_{2}^+-V_{2}^-\right)/J\right).V_2+\left (V_{2}^-+k.\left (V_{3}^+- V_{3}^-\right)/K\right).V_{3}$ \end{document}</tex-math>
<mml:math id="M4">
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:mi>j.</mml:mi>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>+</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo></mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>/</mml:mo>
<mml:mi>J</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mi>.</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>+</mml:mo>
<mml:mi>k.</mml:mi>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>+</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo></mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>/</mml:mo>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mi>.</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12859_2015_518_IEq2.gif"></inline-graphic>
</alternatives>
</inline-formula>
.
<fig id="Fig2">
<label>Figure 2</label>
<caption>
<p>
<bold>SOM analysis of docking results obtained with ADvina on HIV-RT.</bold>
Top line,
<bold>(a)</bold>
,
<bold>(b)</bold>
and
<bold>(c)</bold>
: U-matrices; bottom line,
<bold>(d)</bold>
,
<bold>(e)</bold>
and
<bold>(f)</bold>
: docking score projections. Left column,
<bold>(a)</bold>
and
<bold>(d)</bold>
: DUD-E decoys set (map dimensions (I,J,K) = (24,13,11)). Middle column,
<bold>(b)</bold>
and
<bold>(e)</bold>
: DUD-E active molecules ((I,J,K) = (21,14,11)). Right column,
<bold>(c)</bold>
and
<bold>(f)</bold>
: EGF collection ((I,J,K) = (21,15,11)). Labels (2) and (3) correspond to cavity numbers used in Figure
<xref rid="Fig1" ref-type="fig">1</xref>
. They designate the AS and BS2, respectively.</p>
</caption>
<graphic xlink:href="12859_2015_518_Fig2_HTML" id="MO2"></graphic>
</fig>
<fig id="Fig3">
<label>Figure 3</label>
<caption>
<p>
<bold>SOM analysis of docking results obtained with Dock on HIV-RT.</bold>
Top line,
<bold>(a)</bold>
,
<bold>(b)</bold>
and
<bold>(c)</bold>
: U-matrices; bottom line,
<bold>(d)</bold>
,
<bold>(e)</bold>
and
<bold>(f)</bold>
: docking score projections. Left column,
<bold>(a)</bold>
and
<bold>(d)</bold>
: DUD-E decoys set (map dimensions (I,J,K) are (23,17,8)). Middle column,
<bold>(b)</bold>
and
<bold>(e)</bold>
: DUD-E active molecules ((I,J,K) = (23,18,8)). Right column,
<bold>(c)</bold>
and
<bold>(f)</bold>
] EGF collection ((I,J,K) = (21,17,9)). Labels (2) and (3) correspond to cavity numbers used in Figure
<xref rid="Fig1" ref-type="fig">1</xref>
. They designate the AS and BS2, respectively. Regions labeled with red stars correspond to SOM regions appearing on the HIV-RT surface and considered as docking artifacts.</p>
</caption>
<graphic xlink:href="12859_2015_518_Fig3_HTML" id="MO3"></graphic>
</fig>
<fig id="Fig4">
<label>Figure 4</label>
<caption>
<p>
<bold>SOM analysis of docking results obtained with ADvina on ABL1.</bold>
Top line,
<bold>(a)</bold>
,
<bold>(b)</bold>
and
<bold>(c)</bold>
: U-matrices; bottom line,
<bold>(d)</bold>
,
<bold>(e)</bold>
and
<bold>(f)</bold>
: docking score projections. Left column,
<bold>(a)</bold>
and
<bold>(d)</bold>
: DUD-E decoys set (map dimensions (I,J,K) are equal to (28,15,8)). Middle column,
<bold>(b)</bold>
and
<bold>(e)</bold>
: DUD-E active molecules ((I,J,K) = (31,14,8)). Right column,
<bold>(c)</bold>
and
<bold>(f)</bold>
: EGF collection ((I,J,K) = (34,12,8)). Labels (1’) and (6’) correspond to cavity numbers used in Figure
<xref rid="Fig1" ref-type="fig">1</xref>
. They designate the AS and BS2, respectively.</p>
</caption>
<graphic xlink:href="12859_2015_518_Fig4_HTML" id="MO4"></graphic>
</fig>
<fig id="Fig5">
<label>Figure 5</label>
<caption>
<p>
<bold>SOM analysis of docking results obtained with Dock on ABL1.</bold>
Top line,
<bold>(a)</bold>
,
<bold>(b)</bold>
and
<bold>(c)</bold>
: U-matrices; bottom line,
<bold>(d)</bold>
,
<bold>(e)</bold>
and
<bold>(f)</bold>
: docking score projections. Left column,
<bold>(a)</bold>
and
<bold>(d)</bold>
: DUD-E decoys set (map dimensions (I,J,K) are (18,17,11)). Middle column,
<bold>(b)</bold>
and
<bold>(e)</bold>
: DUD-E active molecules ((I,J,K) = (19,18,10)). Right column,
<bold>(c)</bold>
and
<bold>(f)</bold>
: EGF collection ((I,J,K) = (28,13,9)). The label (1’) corresponds to the cavity number used in Figure
<xref rid="Fig1" ref-type="fig">1</xref>
. It designates the AS.</p>
</caption>
<graphic xlink:href="12859_2015_518_Fig5_HTML" id="MO5"></graphic>
</fig>
</p>
<p>A training cycle consisted in the presentation of each of the
<italic>n</italic>
input vectors in random order with an update of the SOM after each presentation (step). Two phases,
<italic>ϕ</italic>
=1,2, similar to those previously used [
<xref ref-type="bibr" rid="CR38">38</xref>
] were carried out. In each phase, the radius
<italic>r</italic>
<sub>
<italic>ϕ</italic>
,
<italic>t</italic>
</sub>
and the learning rate
<italic>α</italic>
<sub>
<italic>ϕ</italic>
,
<italic>t</italic>
</sub>
at step
<italic>t</italic>
decreased exponentially between initial (0) and final (
<italic>f</italic>
) values,
<italic>r</italic>
<sub>
<italic>ϕ</italic>
,0</sub>
and
<italic>r</italic>
<sub>
<italic>ϕ</italic>
,
<italic>f</italic>
</sub>
respectively (
<italic>r</italic>
<sub>
<italic>ϕ</italic>
,
<italic>t</italic>
</sub>
=(
<italic>r</italic>
<sub>
<italic>ϕ</italic>
,0</sub>
<italic>r</italic>
<sub>
<italic>ϕ</italic>
,
<italic>f</italic>
</sub>
)· exp(−
<italic>t</italic>
/
<italic>λ</italic>
<sub>
<italic>ϕ</italic>
</sub>
)+
<italic>r</italic>
<sub>
<italic>ϕ</italic>
,
<italic>f</italic>
</sub>
). The exponential decay,
<italic>λ</italic>
<sub>
<italic>ϕ</italic>
</sub>
, was set to the total number of steps of the phase divided by 10. In the first phase, one training cycle of
<italic>n</italic>
steps was performed with ((
<italic>r</italic>
<sub>1,0</sub>
,
<italic>r</italic>
<sub>1,
<italic>f</italic>
</sub>
),(
<italic>α</italic>
<sub>1,0</sub>
,
<italic>α</italic>
<sub>1,
<italic>f</italic>
</sub>
))=((7.5,3.75),(1,0.5)). In the second phase, ten training cycles were performed with the parameters set to ((3.75,1),(0.5,0.1)).</p>
<p>As the SOMs were set up with 3D Cartesian coordinates, their spatial representation on the protein structures was straightforward. The average docking score of the atoms mapping one neuron, for example, could simply be displayed with a color code at the position specified by the neuron value.</p>
<p>An element of the Unified Distance Matrix, or U-matrix, formed by I ×
<italic>J</italic>
×K elements
<italic>U</italic>
<sub>
<italic>i</italic>
,
<italic>j</italic>
,
<italic>k</italic>
</sub>
called “U-values”, is calculated from the SOMs as the mean Euclidean distance of the neuron to its 26 direct neighbors. We call areas with low U-values: “high neuron consensus” areas.</p>
</sec>
<sec id="Sec7">
<title>SOM analysis and BS identification</title>
<p>The SOM algorithm reveals docking hotspots by the presence of areas with low U-values and generally high neuron densities. Areas between docking hotspots appear as low density regions on the SOM, and associated with high U-values.</p>
<p>We defined a cutoff (
<italic>t</italic>
<sub>
<italic>U</italic>
</sub>
) on the U-values to distinguish between potential BSs (consensual binding regions with U-values ≤
<italic>t</italic>
<sub>
<italic>U</italic>
</sub>
) from barriers between BSs (regions with U-values >
<italic>t</italic>
<sub>
<italic>U</italic>
</sub>
). To automate the definition of
<italic>t</italic>
<sub>
<italic>U</italic>
</sub>
, we fitted a Gaussian mixture model (GMM) to the distribution of the U-values with an algorithm implemented in the scikit-learn python package [
<xref ref-type="bibr" rid="CR63">63</xref>
]. The number of Gaussians to fit was defined by making use of the Bayesian information criterion (BIC) [
<xref ref-type="bibr" rid="CR64">64</xref>
]. The components which had the largest Gaussian weight were selected. The threshold on the U-value (
<italic>t</italic>
<sub>
<italic>U</italic>
</sub>
) was then defined as:
<disp-formula id="Equ1">
<label>(1)</label>
<alternatives>
<tex-math id="M5">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ t_{U} = \mu_{U} + \sqrt{\sigma_{U}^{2}} $$ \end{document}</tex-math>
<mml:math id="M6">
<mml:msub>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>U</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>U</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>U</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:msqrt>
</mml:math>
<graphic xlink:href="12859_2015_518_Equ1.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where
<italic>μ</italic>
<sub>
<italic>U</italic>
</sub>
and
<inline-formula id="IEq3">
<alternatives>
<tex-math id="M7">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $\sigma _{U}^{2}$ \end{document}</tex-math>
<mml:math id="M8">
<mml:msubsup>
<mml:mrow>
<mml:mi>σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>U</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
<inline-graphic xlink:href="12859_2015_518_IEq3.gif"></inline-graphic>
</alternatives>
</inline-formula>
are the mean and the variance of the dominant Gaussian. Neurons with U-values ≤
<italic>t</italic>
<sub>
<italic>U</italic>
</sub>
were aggregated by connexity and defined
<italic>n</italic>
<sub>
<italic>cc</italic>
</sub>
consensual clusters (CCs).</p>
<p>A radius was then defined to assess if a ligand atom is overlapping a given CC. This radius was automatically set with the same strategy as used for
<italic>t</italic>
<sub>
<italic>U</italic>
</sub>
. The distribution of distances to the nearest neighbors within the CC (4 neighbors per neurons except at the borders of the SOM) is fitted with a 2 components GMM. The cutoff distance, called radius,
<italic>r</italic>
<sub>
<italic>CC</italic>
</sub>
, is then:
<disp-formula id="Equ2">
<label>(2)</label>
<alternatives>
<tex-math id="M9">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ r_{CC} = \mu_{d} + \sqrt{\sigma_{d}^{2}} $$ \end{document}</tex-math>
<mml:math id="M10">
<mml:msub>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">CC</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>μ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:msqrt>
</mml:math>
<graphic xlink:href="12859_2015_518_Equ2.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where
<italic>μ</italic>
<sub>
<italic>d</italic>
</sub>
and
<inline-formula id="IEq4">
<alternatives>
<tex-math id="M11">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $\sigma _{d}^{2}$ \end{document}</tex-math>
<mml:math id="M12">
<mml:msubsup>
<mml:mrow>
<mml:mi>σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
<inline-graphic xlink:href="12859_2015_518_IEq4.gif"></inline-graphic>
</alternatives>
</inline-formula>
are the mean and the variance of the dominant Gaussian. The precision is defined as the fraction of ligand atoms that are within
<italic>r</italic>
<sub>
<italic>CC</italic>
</sub>
distance from any of the CC neurons.</p>
<p>A SOM neuron was considered to be inside a given cavity, defined by
<italic>mkgrid</italic>
(see
<xref rid="Sec4" ref-type="sec">Cavity identification</xref>
paragraph), if at least one corner of the grid cube encompassing its position value had the label of that cavity.</p>
</sec>
<sec id="Sec8">
<title>Chemical descriptors</title>
<p>Compounds were decomposed into chemical substructures with the circular Morgan fingerprints algorithm [
<xref ref-type="bibr" rid="CR50">50</xref>
,
<xref ref-type="bibr" rid="CR51">51</xref>
] as implemented in RDkit [
<xref ref-type="bibr" rid="CR65">65</xref>
], as these Fingerprints have proved efficient in virtual screening [
<xref ref-type="bibr" rid="CR52">52</xref>
-
<xref ref-type="bibr" rid="CR54">54</xref>
]. Fingerprints are calculated by decomposition of the compound into substructures with a user-defined maximal diameter (number of connected bonds). A unique integer identifier is then assigned to these substructures according to atom types and their neighbors.</p>
<p>Substructures were called “chemical features” here. We calculated features with diameters up to 7 non-hydrogen atoms and then filtered features with 3 to 7 non-hydrogen atoms. The docking analysis of the features was performed with SOMs by making use of the geometric center coordinates of each chemical feature as input vectors.</p>
</sec>
<sec id="Sec9">
<title>Analysis of the results obtained with the “chemical features” decomposition</title>
<p>We defined the following sets: (i)
<italic>F</italic>
<sub>
<italic>EGFd</italic>
</sub>
is the set of chemical features present in the EGF compounds which were successfully docked at the protein surfaces, (ii)
<italic>F</italic>
<sub>
<italic>AS</italic>
</sub>
is the set of chemical features of the EGF compounds docked at the AS, (iii)
<inline-formula id="IEq5">
<alternatives>
<tex-math id="M13">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $\overline {F_{\textit {AS}}}$ \end{document}</tex-math>
<mml:math id="M14">
<mml:mover accent="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">¯</mml:mo>
</mml:mover>
</mml:math>
<inline-graphic xlink:href="12859_2015_518_IEq5.gif"></inline-graphic>
</alternatives>
</inline-formula>
is the set of chemical features of the EGF compounds which could never dock at the AS, (iv)
<italic>F</italic>
<sub>
<italic>A</italic>
</sub>
the set of chemical features present in both the
<italic>F</italic>
<sub>
<italic>EGFd</italic>
</sub>
and the set of DUD-E active ligands of the considered target. This represented the set of “active features”. It was used as a validation set
<italic>a posteriori</italic>
.</p>
<p>Using |
<italic>F</italic>
| as the cardinal of the set
<italic>F</italic>
, i.e the number of features belonging to that set, we calculated the enrichment in active features for
<italic>F</italic>
<sub>
<italic>EGFd</italic>
</sub>
,
<italic>F</italic>
<sub>
<italic>AS</italic>
</sub>
and
<inline-formula id="IEq6">
<alternatives>
<tex-math id="M15">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $\overline {F_{\textit {AS}}}$ \end{document}</tex-math>
<mml:math id="M16">
<mml:mover accent="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">¯</mml:mo>
</mml:mover>
</mml:math>
<inline-graphic xlink:href="12859_2015_518_IEq6.gif"></inline-graphic>
</alternatives>
</inline-formula>
as follows:
<disp-formula id="Equ3">
<label>(3)</label>
<alternatives>
<tex-math id="M17">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ E(EGFd) = \frac{| F_{A} | }{ | F_{EGFd} | } $$ \end{document}</tex-math>
<mml:math id="M18">
<mml:mi>E</mml:mi>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">EGFd</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">EGFd</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
<graphic xlink:href="12859_2015_518_Equ3.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>
<disp-formula id="Equ4">
<label>(4)</label>
<alternatives>
<tex-math id="M19">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ E(AS) = \frac{|F_{A} \cap F_{AS}| }{ |F_{AS}| } $$ \end{document}</tex-math>
<mml:math id="M20">
<mml:mi>E</mml:mi>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">AS</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
<graphic xlink:href="12859_2015_518_Equ4.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>
<disp-formula id="Equ5">
<label>(5)</label>
<alternatives>
<tex-math id="M21">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ E(\overline{AS}) = \frac{|F_{A} \cap \overline{F_{AS}}| }{ |\overline{F_{AS}}|} $$ \end{document}</tex-math>
<mml:math id="M22">
<mml:mi>E</mml:mi>
<mml:mo>(</mml:mo>
<mml:mover accent="false">
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
<mml:mo accent="true">¯</mml:mo>
</mml:mover>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mover accent="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">¯</mml:mo>
</mml:mover>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mover accent="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">¯</mml:mo>
</mml:mover>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
<graphic xlink:href="12859_2015_518_Equ5.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>Then, the sensitivity
<italic>Se</italic>
was calculated as the number of “active features” that docked in the AS divided by the total number of “active features”:
<disp-formula id="Equ6">
<label>(6)</label>
<alternatives>
<tex-math id="M23">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ Se = \frac{|F_{A} \cap F_{AS}| }{ |F_A| } $$ \end{document}</tex-math>
<mml:math id="M24">
<mml:mtext mathvariant="italic">Se</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
<graphic xlink:href="12859_2015_518_Equ6.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>Similarly, the specificity
<italic>Sp</italic>
was calculated as the number of “inactive features” that never docked in the AS divided by the total number of “inactive features”:
<disp-formula id="Equ7">
<label>(7)</label>
<alternatives>
<tex-math id="M25">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ Sp = \frac{| (F_{EGFd} \backslash F_A) \cap \overline{F_{AS}} | }{ | F_{EGFd} \backslash F_{A} | } $$ \end{document}</tex-math>
<mml:math id="M26">
<mml:mtext mathvariant="italic">Sp</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">EGFd</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo></mml:mo>
<mml:mover accent="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">¯</mml:mo>
</mml:mover>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">EGFd</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:math>
<graphic xlink:href="12859_2015_518_Equ7.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where
<italic>F</italic>
<sub>
<italic>EGFd</italic>
</sub>
<italic>F</italic>
<sub>
<italic>A</italic>
</sub>
is the set of chemical features present in
<italic>F</italic>
<sub>
<italic>EGFd</italic>
</sub>
and not in
<italic>F</italic>
<sub>
<italic>A</italic>
</sub>
, which constitutes the set of “inactive features”.</p>
<p>To assess the quality of these quantities, we built a null hypothesis by randomization of the features that dock in the AS (
<italic>F</italic>
<sub>
<italic>AS</italic>
</sub>
) 1 million times. In a perfect scenario, all the active features (
<italic>F</italic>
<sub>
<italic>A</italic>
</sub>
) would dock in the AS, giving a sensitivity equal to 1. In the worst scenario, none of the active features would dock in the AS and
<italic>Se</italic>
= 0. The results data should be normally distributed
<italic>N</italic>
(
<italic>μ</italic>
,
<italic>σ</italic>
). The Z-score is the distance in terms of
<italic>σ</italic>
between the sensitivity obtained and the mean
<italic>μ</italic>
of the normal distribution of sensitivities corresponding to a random distribution of the features. We performed the same analysis for the specificity by randomizing the features that would never dock in the AS (
<inline-formula id="IEq7">
<alternatives>
<tex-math id="M27">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $\overline {F_{\textit {AS}}}$ \end{document}</tex-math>
<mml:math id="M28">
<mml:mover accent="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">¯</mml:mo>
</mml:mover>
</mml:math>
<inline-graphic xlink:href="12859_2015_518_IEq7.gif"></inline-graphic>
</alternatives>
</inline-formula>
). We consider that any Z-score value higher than 4 for
<italic>Se</italic>
and
<italic>Sp</italic>
indicate strong significance as they could not have been obtained randomly.</p>
</sec>
</sec>
<sec id="Sec10" sec-type="results">
<title>Results</title>
<sec id="Sec11">
<title>Binding site identification</title>
<p>We used
<italic>mkgrid</italic>
to calculate cavities for all 102 targets (see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S1), and classified them by the number of cavities having a volume superior to 100Å
<sup>3</sup>
(Table
<xref rid="Tab1" ref-type="table">1</xref>
). We chose representative targets from the two largest categories of targets, which had two or three cavities larger than 100Å
<sup>3</sup>
: ABL1 and HIV-RT, respectively. In a first step of the present work, we calibrated our method on these two targets.
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>
<bold>DUD-E targets clustered into categories, according to the number of cavities (detected with</bold>
<bold>
<italic>mkgrid</italic>
</bold>
<bold>) with a volume superior to 100Å</bold>
<sup>
<bold>3</bold>
</sup>
</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">
<bold>Nbre cav</bold>
</th>
<th align="left">
<bold>Targets</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">8</td>
<td align="justify">pa2ga</td>
</tr>
<tr>
<td align="left">6</td>
<td align="justify">hmdh</td>
</tr>
<tr>
<td align="left">5</td>
<td align="justify">braf</td>
</tr>
<tr>
<td align="left">4</td>
<td align="justify">reni prgr pgh2 glcm esr2 dpp4 cxcr4 cp3a4</td>
</tr>
<tr>
<td align="left">3</td>
<td align="justify">mk01 kpcb kith
<bold>hivrt</bold>
esr1 drd3 cp2c9 aofb adrb1 aces pgh1 parp1</td>
</tr>
<tr>
<td align="left">2</td>
<td align="justify">vgfr2 thrb thb tgfr1 src sahh pyrd pygm pparg ppard ppara nram mcr</td>
</tr>
<tr>
<td align="left"></td>
<td align="justify">lck jak2 inha gria2 gcr fgfr1 dhi1 bace1 andr ampc adrb2 ace
<bold>abl1</bold>
</td>
</tr>
<tr>
<td align="left">1</td>
<td align="justify">aa2ar ada17 ada akt1 akt2 aldr cah2 casp3 cdk2 comt csf1r def dyr</td>
</tr>
<tr>
<td align="left"></td>
<td align="justify">egfr fa10 fa7 fabp4 fak1 fkb1a fnta fpps grik1 hdac2 hdac8 hivint hivpr</td>
</tr>
<tr>
<td align="left"></td>
<td align="justify">hs90a hxk4 igf1r ital kif11 kit lkha4 mapk2 met mk10 mk14 mmp13 mp2k1</td>
</tr>
<tr>
<td align="left"></td>
<td align="justify">nos1 pde5a plk1 pnph ptn1 pur2 rock1 rxra try1 tryb1 tysy urok wee1 xiap</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The targets HIV-RT and ABL1 used for the calibration step are shown in bold. They belong to categories 3 and 2, respectively.</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>We analyzed the successful docking poses with SOMs applied on individual atom Cartesian coordinates. For HIV-RT, 337 out of 338 DUD-E active molecules could be docked with ADvina. Similarly, 18873 out of 18880 DUD-E decoys, and 1421 out of 1500 EGF fragments docked. With Dock, the numbers of successfully docked molecules were 194, 12273, and 1152, respectively. For ABL1, ADvina allowed the docking of all DUD-E active molecules (182), all decoys (10745) and 1421 out of 1500 EGF fragments. With Dock on ABL1, 180, 10674 and 1422 molecules docked, respectively. The maps are shown in Figures
<xref rid="Fig2" ref-type="fig">2</xref>
,
<xref rid="Fig3" ref-type="fig">3</xref>
,
<xref rid="Fig4" ref-type="fig">4</xref>
and
<xref rid="Fig5" ref-type="fig">5</xref>
.</p>
<p>The distances of the input vectors to their representative neurons were calculated to evaluate the acuity of the SOMs (see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S2). They were normally distributed. For HIV-RT, they were 0.33 ± 0.40 Å with ADvina and 0.50 ± 0.35 Å with Dock. For ABL1, distance distributions were similar for ADvina and Dock (0.33 ± 0.40 Å). Overall, the SOMs appeared fairly acute.</p>
<p>U-values and docking scores were projected on the SOMs and displayed with a color gradient. The co-crystallized ligand was also represented in licorice to show the AS position (Figures
<xref rid="Fig2" ref-type="fig">2</xref>
,
<xref rid="Fig3" ref-type="fig">3</xref>
,
<xref rid="Fig4" ref-type="fig">4</xref>
and
<xref rid="Fig5" ref-type="fig">5</xref>
). Lower values (dark blue) indicate regions with a large number of docked molecules. These high neuron consensus regions are plausible BS candidates.</p>
<sec id="Sec12">
<title>HIV-RT</title>
<p>The SOM analyzes for HIV-RT are shown in Figure
<xref rid="Fig2" ref-type="fig">2</xref>
for ADVina, and Figure
<xref rid="Fig3" ref-type="fig">3</xref>
for Dock. U-matrices are shown in the top line (a-c) and docking score projections in the bottom line (d-f).</p>
<p>For ADvina, the U-values for the three libraries, DUD-E decoys, DUD-E active molecules and EGF fragments, shown in Figure
<xref rid="Fig2" ref-type="fig">2</xref>
(a) to (c), are quite similar, with the same areas showing low U-values. One, labeled (2), contains the co-crystallized ligand, and thus fits the AS as described in the literature (PDB entry: 3LAN). The second area, labeled (3), corresponds to an allosteric site of HIV-RT [
<xref ref-type="bibr" rid="CR66">66</xref>
] and will be referred to as the second binding site (BS2) of HIV-RT. For the EGF library, a higher neuron consensus (lower U-values) is observed at the AS than with the DUD-E active molecules. Inversely, the fragments gave lower neuron consensus than the DUD-E active compounds at the BS2. As regards to docking scores, they are more favorable at the AS than at the BS2 for all three libraries (Figure
<xref rid="Fig2" ref-type="fig">2</xref>
(a) to (c)).</p>
<p>The maps obtained with Dock have a different shape (Figure
<xref rid="Fig3" ref-type="fig">3</xref>
). Five clusters could be identified. Two of them correspond to the AS and the BS2 described above. The remaining clusters appear at the surface of the protein (marked by red stars on Figure
<xref rid="Fig3" ref-type="fig">3</xref>
), and did not match any detected cavities (Figure
<xref rid="Fig1" ref-type="fig">1</xref>
). They will not be considered as relevant BSs here.</p>
<p>The neuron consensus obtained at the AS with the DUD-E active molecules and the EGF fragments (Figure
<xref rid="Fig3" ref-type="fig">3</xref>
(b) and (c)), were higher than with the DUD-E decoys (Figure
<xref rid="Fig3" ref-type="fig">3</xref>
(a)). The docking scores alone could not provide any discrimination with neither of the three libraries (Figure
<xref rid="Fig3" ref-type="fig">3</xref>
(a) to (c)).</p>
</sec>
<sec id="Sec13">
<title>ABL1</title>
<p>The SOMs obtained on ABL1 with ADvina are shown in Figure
<xref rid="Fig4" ref-type="fig">4</xref>
. U-matrices revealed two high neuron consensus areas. The first one is the AS of ABL1, containing the co-crystallized ligand (PDB entry: 2HZI; Figure
<xref rid="Fig4" ref-type="fig">4</xref>
; label (1’)). The second area matches a big pocket labeled (6’). It is close to the AS and involves the activation loop of ABL1 [
<xref ref-type="bibr" rid="CR67">67</xref>
] and will be referred to as the BS2 of ABL1.</p>
<p>Fragments from the EGF library yielded a more compact map than the DUD-E molecules (Figure
<xref rid="Fig4" ref-type="fig">4</xref>
(c)). The highest neuron consensus appeared at the AS.</p>
<p>The docking scores at the AS were lower than at the BS2 with the three libraries (Figure
<xref rid="Fig4" ref-type="fig">4</xref>
(a) to (c)). DUD-E active molecules and EGF fragments had better scores at the AS than the decoys.</p>
<p>The SOM analysis of Dock outputs are reported in Figure
<xref rid="Fig5" ref-type="fig">5</xref>
. Although the spheres defining the docking area cover the AS, BS2 and other pockets, molecules only docked in the AS.</p>
<p>DUD-E active molecules (Figure
<xref rid="Fig5" ref-type="fig">5</xref>
(b)) mapped the AS better than the decoys (Figure
<xref rid="Fig5" ref-type="fig">5</xref>
(a)), as denoted by lower U-values. The EGF fragments yielded an even more compact map (Figure
<xref rid="Fig5" ref-type="fig">5</xref>
(c)), tightly fitting the AS.</p>
</sec>
</sec>
<sec id="Sec14">
<title>Binding site characterization</title>
<p>For both targets,
<italic>mkgrid</italic>
detected cavities corresponding to AS and BS2, as well as other cavities (Figure
<xref rid="Fig1" ref-type="fig">1</xref>
). We detected 9 cavities in the HIV-RT target subdomain, 3 of which had a volume larger than 100 Å
<sup>3</sup>
(see Table
<xref rid="Tab1" ref-type="table">1</xref>
and Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S2 for cavities labels). For the ABL1 subdomain these figures were 12 and 2, respectively. We calculated the neuron density as the number of neurons inside the cavity divided by the cavity volume (see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S2). We used the SOMs trained on the EGF outputs for that calculation.</p>
<sec id="Sec15">
<title>HIV-RT</title>
<p>For HIV-RT, cavities number (2) and (3) corresponding to the AS and BS2, have volumes of 338.5 Å
<sup>3</sup>
and 957.4 Å
<sup>3</sup>
, respectively. The EGF fragments yielded the highest neuron densities in the AS (3.070 and 2.065 neuron/Å
<sup>3</sup>
with ADvina and Dock, respectively; Table
<xref rid="Tab2" ref-type="table">2</xref>
). DUD-E active molecules showed lower neuron densities (1.563 and 1.731 neuron/Å
<sup>3</sup>
with ADvina and Dock, respectively). The lowest values were obtained with the DUD-E decoys (1.158 and 0.718 neuron/Å
<sup>3</sup>
with ADvina and Dock, respectively).
<table-wrap id="Tab2">
<label>Table 2</label>
<caption>
<p>
<bold>Neuron density of the active site (AS) and the second binding site (BS2) of HIV-RT and ABL1, for all study combinations</bold>
</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left"></th>
<th align="left"></th>
<th align="left">
<bold>Decoys</bold>
</th>
<th align="left">
<bold>Actives</bold>
</th>
<th align="left">
<bold>EGF</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">HIV-RT</td>
<td align="center">AS (338.5Å
<sup>3</sup>
)</td>
<td align="center">ADvina</td>
<td align="center">1.158</td>
<td align="center">1.563</td>
<td align="center">3.070</td>
</tr>
<tr>
<td align="center"></td>
<td align="center"></td>
<td align="center">Dock</td>
<td align="center">0.718</td>
<td align="center">1.731</td>
<td align="center">2.065</td>
</tr>
<tr>
<td align="center"></td>
<td align="center">BS2 (957.4Å
<sup>3</sup>
)</td>
<td align="center">ADvina</td>
<td align="center">1.692</td>
<td align="center">1.349</td>
<td align="center">0.976</td>
</tr>
<tr>
<td align="center"></td>
<td align="center"></td>
<td align="center">Dock</td>
<td align="center">0.298</td>
<td align="center">0.315</td>
<td align="center">0.251</td>
</tr>
<tr>
<td align="center">ABL1</td>
<td align="center">AS (257.1Å
<sup>3</sup>
)</td>
<td align="center">ADvina</td>
<td align="center">1.081</td>
<td align="center">1.851</td>
<td align="center">3.940</td>
</tr>
<tr>
<td align="center"></td>
<td align="center"></td>
<td align="center">Dock</td>
<td align="center">1.412</td>
<td align="center">3.003</td>
<td align="center">6.410</td>
</tr>
<tr>
<td align="center"></td>
<td align="center">BS2 (615.5Å
<sup>3</sup>
)</td>
<td align="center">ADvina</td>
<td align="center">2.587</td>
<td align="center">2.244</td>
<td align="center">2.600</td>
</tr>
<tr>
<td align="center"></td>
<td align="center"></td>
<td align="center">Dock</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>Inversely, in the BS2 with ADvina, the highest neuron density was observed with the decoys (1.692 neuron/Å
<sup>3</sup>
), followed by the DUD-E active molecules (1.349 neuron/Å
<sup>3</sup>
), then by the EGF fragments (0.976 neuron/Å
<sup>3</sup>
). Dock outputs yielded low densities at the BS2 (Table
<xref rid="Tab2" ref-type="table">2</xref>
).</p>
</sec>
<sec id="Sec16">
<title>ABL1</title>
<p>Cavities corresponding to ABL1’s AS (1’) and BS2 (6’) have volumes of 257.1Å
<sup>3</sup>
and 615.5Å
<sup>3</sup>
, respectively. Neuron densities at the AS presented the same trend as that observed for HIV-RT. The EGF fragments had the highest densities (3.940 and 2.677 neuron/Å
<sup>3</sup>
with ADvina and Dock, respectively), followed by the DUD-E active molecules (1.851 and 1.254 neuron/Å
<sup>3</sup>
with ADvina and Dock, respectively), and finally the DUD-E decoys (1.081 and 0.590 neuron/Å
<sup>3</sup>
with ADvina and Dock, respectively; Table
<xref rid="Tab2" ref-type="table">2</xref>
). For the BS2, the EGF fragments yielded the highest densities (2.600 neuron/Å
<sup>3</sup>
with ADvina). The DUD-E decoy showed the second highest neuron density at the BS2, followed closely by the active molecules, (2.587 and 2.244 neuron/Å
<sup>3</sup>
, respectively; Table
<xref rid="Tab2" ref-type="table">2</xref>
).</p>
<p>Overall, the EGF fragments yielded the highest neuron densities at the active sites regardless of the target and the docking software. Nevertheless, ADvina performed a better fitting of the identified BSs for both targets. Moreover, it is much faster than Dock. In the next step, we assessed our method performances on all targets in the DUD-E database using the EGF collection as probe library and ADvina as docking algorithm.</p>
</sec>
</sec>
<sec id="Sec17">
<title>Automatic BS identification</title>
<p>We automated the protocol identified with HIV-RT and ABL1: docking of the EGF collection with ADvina and called it “SOM-BSfinder”. We applied it on the 102 targets of the DUD-E. Regions of the SOMs presenting high U-values (see
<xref rid="Sec2" ref-type="sec">Methods</xref>
section) were removed, and contiguous regions remaining on the SOM defined as the consensual clusters (CCs). The number of neurons per CC were used to sort them. The label 1 was attributed to the CC with the highest number of neurons, and so on.</p>
<p>The most populated CC, with label 1, was assumed to predict the AS while the co-crystallized ligand position was used to define the AS position. Hence, we calculated the fraction of ligand atoms contained in each SOM CC for each target. A ligand atom is considered “inside” a CC if it is located within a distance equal or superior to the radius of that CC (see
<xref rid="Sec2" ref-type="sec">Methods</xref>
and Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S3). The average fraction of overlapping atoms was equal to 40% and 44% with the first CC and the first three CCs, respectively. The maximal fraction, equal to 84%, was observed at the first CC for the target FKB1A (Figure
<xref rid="Fig6" ref-type="fig">6</xref>
). If no precision criterion is applied on these fractions, SOM-BSfinder was able to detect atoms of the ligand within the most populated CC in 90% of the cases, and within one of the three most populated CCs (Top3) in 99% of the cases (101 targets, see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S3). SOM-BSfinder failed to detect the AS for only one target (XIAP). The distribution of these fractions is represented in Figure
<xref rid="Fig7" ref-type="fig">7</xref>
according to the number of identified CCs.
<fig id="Fig6">
<label>Figure 6</label>
<caption>
<p>
<bold>Results obtained for the FKB1A target.</bold>
A test case where the AS is detected in the first CC with the maximal precision (fraction of overlapping atoms equal to 0.84).
<bold>(a)</bold>
The SOM obtained for FKB1A with the co-crystal ligand shown in pink licorice.
<bold>(b)</bold>
The CCs obtained are ranked with regards to their neuron densities and represented with a color gradient going from blue (most populated) to red (less populated). The first CC (dark blue) contains the 84% of the active ligand atoms (shown in pink licorice).</p>
</caption>
<graphic xlink:href="12859_2015_518_Fig6_HTML" id="MO6"></graphic>
</fig>
<fig id="Fig7">
<label>Figure 7</label>
<caption>
<p>
<bold>Fraction of overlapping ligand atoms with the most populated consensual cluster (first CC).</bold>
Data is displayed according to the number of identified CCs.</p>
</caption>
<graphic xlink:href="12859_2015_518_Fig7_HTML" id="MO7"></graphic>
</fig>
</p>
<p>To better evaluate the accuracy of SOM-BSfinder in detecting the AS, we calculated at which frequency the ligand atoms were found in SOM CC number
<italic>n</italic>
(Figure
<xref rid="Fig8" ref-type="fig">8</xref>
). To remain stringent, detection was considered as failed if the ligand overlapped two or more SOM CCs (3 targets: ACE, DRD3 and PLK1) or with no SOM CC (XIAP; hence, sum of frequency ≈ 96%). The AS was identified within the first most populated CC in 87% of the cases, and within the second or the third most populated CC in less than 9% of the cases. Beyond the third most populated CC, no overlapping was observed with the ligand atoms (Figure
<xref rid="Fig8" ref-type="fig">8</xref>
).
<fig id="Fig8">
<label>Figure 8</label>
<caption>
<p>
<bold>Occurrence of the AS at the different CCs identified.</bold>
Cases where the AS is overlapping only one CC (98 cases, 96% of the targets) are considered for this plot. The first CC accounts for 87% of the cases, the second CC accounts for 6% and the third CC account for 3%. Zero occurrence for the AS was detected beyond the third CC.</p>
</caption>
<graphic xlink:href="12859_2015_518_Fig8_HTML" id="MO8"></graphic>
</fig>
</p>
<p>We calculated the success rate (SR) of SOM-BSfinder. For that, we consider that the AS was successfully identified if the fraction of ligand atoms within the radius
<italic>r</italic>
<sub>
<italic>CC</italic>
</sub>
(see Equation (
<xref rid="Equ2" ref-type="">2</xref>
), Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S3) from SOM CC points was superior to a precision threshold of 0.25. SOM-BSfinder showed SR values of 90% when the Top3 CCs were considered, and 82% for the first CC alone (Table
<xref rid="Tab3" ref-type="table">3</xref>
, first line). Then, we compared SOM-BSfinder performances to other energy-based/probe-mapping methods (FTSite [
<xref ref-type="bibr" rid="CR19">19</xref>
], Q-SiteFinder [
<xref ref-type="bibr" rid="CR21">21</xref>
,
<xref ref-type="bibr" rid="CR22">22</xref>
] and SiteHound [
<xref ref-type="bibr" rid="CR27">27</xref>
]), based on their success rates (SRs). For that, we had to adapt the precision and radius cutoffs to match those used by the authors of the concerned programs. FTSite [
<xref ref-type="bibr" rid="CR19">19</xref>
] and Q-SiteFinder [
<xref ref-type="bibr" rid="CR21">21</xref>
,
<xref ref-type="bibr" rid="CR22">22</xref>
] consider that the AS was successfully identified if the fraction of ligand atoms within 1.6 Å of SOM CC points was superior to 0.25. In contrast, SiteHound uses a cutoff distance of 2.0 Å, a fraction superior to 0.15 and only consider heavy atoms. The results for the different methods and SOM-BSfinder with the respective parameters are shown in Table
<xref rid="Tab3" ref-type="table">3</xref>
. In this specific context and with regards to the fact that different datasets were used by the described methods, SOM-BSfinder outperformed all three methods (see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S4).
<table-wrap id="Tab3">
<label>Table 3</label>
<caption>
<p>
<bold>Success rate values for SOM-BSfinder and other probe-mapping methods</bold>
</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">
<bold>Precision</bold>
</th>
<th align="left">
<bold>Method</bold>
</th>
<th align="left">
<bold>Atom type</bold>
</th>
<th align="left">
<bold>Radius (Å)</bold>
</th>
<th align="left">
<bold>Top3 SR</bold>
</th>
<th align="left">
<bold>Top1 SR</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">25%</td>
<td align="left">SOM-BSfinder</td>
<td align="left">all</td>
<td align="left">[0.5-0.8]</td>
<td align="left">90%</td>
<td align="left">82%</td>
</tr>
<tr>
<td align="left">25%</td>
<td align="left">SOM-BSfinder</td>
<td align="left">all</td>
<td align="left">1.6</td>
<td align="left">97%</td>
<td align="left">88%</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">FTSite
<sup>(∗)</sup>
</td>
<td align="left">all</td>
<td align="left">1.6</td>
<td align="left">97%</td>
<td align="left">80%</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">Q-SiteFinder</td>
<td align="left">all</td>
<td align="left">1.6</td>
<td align="left">90%</td>
<td align="left">71%</td>
</tr>
<tr>
<td align="left">15%</td>
<td align="left">SOM-BSfinder</td>
<td align="left">heavy</td>
<td align="left">2.0</td>
<td align="left">98%</td>
<td align="left">89%</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">SiteHound</td>
<td align="left">heavy</td>
<td align="left">2.0</td>
<td align="left">80-84%</td>
<td align="left"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>In the first section, SOM-BSfinder performances with its defaults parameters. In the second section, default parameters of FTSite [
<xref ref-type="bibr" rid="CR21">21</xref>
,
<xref ref-type="bibr" rid="CR22">22</xref>
] and Q-SiteFinder [
<xref ref-type="bibr" rid="CR19">19</xref>
] were used; (*) Values for FTSite were calculated on a set of 35 targets [
<xref ref-type="bibr" rid="CR22">22</xref>
]. In the last section, default parameters of SiteHound [
<xref ref-type="bibr" rid="CR27">27</xref>
] were used.</p>
</table-wrap-foot>
</table-wrap>
</p>
</sec>
<sec id="Sec18">
<title>Chemical descriptors</title>
<p>We tested if we could use information on the ligand chemical composition to provide information on relevant chemical groups, in addition to the fact that they refined the acuity of the active site identification.</p>
<p>We made this analysis on HIV-RT and ABL1, the benchmarks used to setup the method.</p>
<p>We used the Morgan fingerprints as chemical descriptors (see
<xref rid="Sec2" ref-type="sec">Methods</xref>
). They describe the molecule chemistry through an inventory of each atom environment, which can be viewed as local chemical groups or moieties. Their application gave higher neuron density and consensus for both HIV-RT and ABL1 (Figure
<xref rid="Fig9" ref-type="fig">9</xref>
), thus refining the BS geometrical definition.
<fig id="Fig9">
<label>Figure 9</label>
<caption>
<p>
<bold>SOM analysis of docking results obtained with ADvina with atomic coordinates as input vectors for HIV-RT (a) and ABL1 (c); and with the coordinates of the geometric centers of the chemical features as input vectors for HIV-RT (b) and ABL1 (d).</bold>
Labels (2), (3), (1’) and (6’) correspond to cavity numbers used in Figure
<xref rid="Fig1" ref-type="fig">1</xref>
. They designate the AS and BS2 of HIV-RT and ABL1, respectively.</p>
</caption>
<graphic xlink:href="12859_2015_518_Fig9_HTML" id="MO9"></graphic>
</fig>
</p>
<p>To evaluate the insight provided with this approach, we calculated the enrichment of known “active features”. The latter term denotes chemical substructures observed in the active ligands provided by the DUD-E database. The enrichment in “active features” in the EGF collection
<italic>E</italic>
(
<italic>E</italic>
<italic>G</italic>
<italic>F</italic>
<italic>d</italic>
), (Equation (
<xref rid="Equ3" ref-type="">3</xref>
)), their presence in the AS,
<italic>E</italic>
(
<italic>A</italic>
<italic>S</italic>
), (Equation (
<xref rid="Equ4" ref-type="">4</xref>
)), and conversely their absence in the AS,
<inline-formula id="IEq8">
<alternatives>
<tex-math id="M29">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $E(\overline {AS})$ \end{document}</tex-math>
<mml:math id="M30">
<mml:mi>E</mml:mi>
<mml:mo>(</mml:mo>
<mml:mover accent="false">
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
<mml:mo accent="true">¯</mml:mo>
</mml:mover>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12859_2015_518_IEq8.gif"></inline-graphic>
</alternatives>
</inline-formula>
, (Equation (
<xref rid="Equ5" ref-type="">5</xref>
)), are reported in Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S5. For both targets, the proportion of active features docked in AS was larger than the proportion of active features that never docked in AS (
<inline-formula id="IEq9">
<alternatives>
<tex-math id="M31">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $E(AS)/E(\overline {AS})$ \end{document}</tex-math>
<mml:math id="M32">
<mml:mi>E</mml:mi>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">AS</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>/</mml:mo>
<mml:mi>E</mml:mi>
<mml:mo>(</mml:mo>
<mml:mover accent="false">
<mml:mrow>
<mml:mtext mathvariant="italic">AS</mml:mtext>
</mml:mrow>
<mml:mo accent="true">¯</mml:mo>
</mml:mover>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12859_2015_518_IEq9.gif"></inline-graphic>
</alternatives>
</inline-formula>
is 4.64 and 4.44 for HIV-RT AS, and ABL1, respectively). Similarly, the proportion of active features docked in AS was also higher than the proportion of active features in the docked fragments (
<italic>E</italic>
(
<italic>A</italic>
<italic>S</italic>
)/
<italic>E</italic>
(
<italic>E</italic>
<italic>G</italic>
<italic>F</italic>
<italic>d</italic>
) is 2.85 and 2.87 for HIV-RT and ABL1, respectively, see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S5).</p>
<p>We calculated the sensitivity and the specificity of the method as described in Equations (
<xref rid="Equ6" ref-type="">6</xref>
) and (
<xref rid="Equ7" ref-type="">7</xref>
). For both targets, the sensitivity was moderate, whereas the specificity was high (0.49 and 0.85, respectively, for HIV-RT and 0.46 and 0.86, respectively, for ABL1 (Table
<xref rid="Tab4" ref-type="table">4</xref>
)). The ratios
<italic>S</italic>
<italic>e</italic>
/(1−
<italic>S</italic>
<italic>p</italic>
) were higher than 1 in both cases (3.67 for HIV-RT and 3.28 for ABL1). We also calculated the Z-score of the sensitivity and the specificity values by comparison to randomized data (see
<xref rid="Sec2" ref-type="sec">Methods</xref>
). The very high Z-score values obtained (Z-score ≥20, see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S6) show that these results are significantly away from a random distribution of the features over the identified CCs. Interestingly, the ratios
<italic>S</italic>
<italic>e</italic>
/(1−
<italic>S</italic>
<italic>p</italic>
) are close to 1 for both randomized tests, hence confirming lack of information content.
<italic>Se</italic>
seems to be more affected than
<italic>Sp</italic>
by the randomized test, suggesting that the sensitivity is, for those targets, the factor yielding a higher
<italic>S</italic>
<italic>e</italic>
/(1−
<italic>S</italic>
<italic>p</italic>
) ratio and, thus, the discriminating power of the method.
<table-wrap id="Tab4">
<label>Table 4</label>
<caption>
<p>
<bold>Sensitivity (Se) and specificity (Sp) values obtained for test targets HIV-RT and ABL1</bold>
</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left">
<bold>Se</bold>
</th>
<th align="left">
<bold>Sp</bold>
</th>
<th align="left">
<bold>Se/(1-Sp)</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">HIV-RT</td>
<td align="left">0.49</td>
<td align="left">0.85</td>
<td align="left">3.67</td>
</tr>
<tr>
<td align="left">ABL1</td>
<td align="left">0.46</td>
<td align="left">0.86</td>
<td align="left">3.28</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
</sec>
<sec id="Sec19" sec-type="discussion">
<title>Discussion</title>
<p>In this work, we presented and evaluated a method for the identification of binding sites (BSs) based on docking and Self-Organizing Maps (SOMs). Binding site identification is essential in the process of structure-based drug discovery, but remains a highly complex task and an active area of research.</p>
<p>Our method bears similarities to probe-mapping approaches, but we took advantage of existing docking algorithms [
<xref ref-type="bibr" rid="CR44">44</xref>
] to directly screen small molecule or fragment libraries. This allowed us to use entire molecules for the analysis. In contrast, classical probe mapping approaches use atoms or small chemical groups [
<xref ref-type="bibr" rid="CR17">17</xref>
,
<xref ref-type="bibr" rid="CR18">18</xref>
,
<xref ref-type="bibr" rid="CR20">20</xref>
,
<xref ref-type="bibr" rid="CR68">68</xref>
,
<xref ref-type="bibr" rid="CR69">69</xref>
] to map a protein surface. The diversity of the probe library made it possible to take into account simultaneously shape, volume and the chemical composition of the protein surface in a more detailed way. Importantly, our method does not require any prior knowledge on active ligands, but actually identifies promising moieties.</p>
<p>To calibrate the method, we used 12 different “combinations” of protein targets (HIV-RT and ABL1), docking programs (ADvina and Dock) and ligand libraries (DUD-E active molecules, DUD-E decoys and EGF collection). The method readily identified the experimentally known ASs regardless of the target, the docking algorithm and the chemical library. For both targets, we could also identify a relevant second BS, a known allosteric site of HIV-RT [
<xref ref-type="bibr" rid="CR66">66</xref>
], and an activation site controlling ABL1 catalytic activity [
<xref ref-type="bibr" rid="CR67">67</xref>
,
<xref ref-type="bibr" rid="CR70">70</xref>
]. The identified BSs appeared as dense and homogeneous regions of the SOMs.</p>
<p>The consistency of the results obtained with various different conditions suggests that the method is robust and applicable to different types of binding surfaces, ligands, and docking programs. Nevertheless, some combinations (docking algorithm, chemical library) appeared to perform better than others. The EGF fragments turned out to be the best probe library for our method, surpassing the dedicated DUD-E active molecules. This indicates that prior knowledge on the ligand is less important than the relevance of the probe library, possibly its chemical diversity and the moderate size of its components. ADvina gave a finer density and homogeneity compared to Dock. The energy grid calculation is a key step before the actual docking. ADvina uses AutoGrid [
<xref ref-type="bibr" rid="CR71">71</xref>
] while Dock uses
<italic>grid</italic>
[
<xref ref-type="bibr" rid="CR72">72</xref>
]. As far as we know, there is no study in binding site identification based on
<italic>grid</italic>
[
<xref ref-type="bibr" rid="CR72">72</xref>
]. By contrast, AutoGrid is used by two successful probe-mapping/energy-based algorithms; AutoLigand [
<xref ref-type="bibr" rid="CR18">18</xref>
] and SiteHound [
<xref ref-type="bibr" rid="CR27">27</xref>
]. ADvina docking scores were the most favorable at the AS, and permitted to better discriminate it against BS2 for both targets. Conversely, Dock scores were not able to differentiate AS, BS2 and regions on the protein surface for HIV-RT. Finally, the U-values and the neuron density proved more reliable in identifying binding sites in general.</p>
<p>To assess the accuracy of our BS identification method, we automated it using EGF fragments as probe library and ADvina as docking algorithm. We called this automated approach SOM-BSfinder. The evaluation of the density and homogeneity of neurons on the 3D SOM allow to directly identify consensual clusters (CCs) ranked according to their densities. No limit to the number of BSs is required. The user may consider all identified CCs with respect to prior knowledge of the target, if available. Nevertheless, SOM-BSfinder was able to detect the AS exclusively among the Top3 CCs in 96% of the cases, and in the first CC in 87% of the cases (Figure
<xref rid="Fig8" ref-type="fig">8</xref>
). The average precision of the BS identification is 44% and 40% for the Top3 CCs and the first CC, respectively.</p>
<p>With a precision threshold of 25% to define success in identifying the AS, the success rate (SR) of SOM-BSfinder was equal to 90% and 82% for the Top3 CCs and first CC, respectively. It compared favorably with other probe-mapping/energy-based methods. We compared it with SiteHound [
<xref ref-type="bibr" rid="CR27">27</xref>
] which also used AutoGrid for grid calculation for a carbon probe. This grid is used to identify three favorable BSs at the protein surface. These sites are then targeted for the docking of one ligand molecule with AutoDock 4 [
<xref ref-type="bibr" rid="CR28">28</xref>
]. SiteHound achieved a success rate between 80 and 84% for the first three BSs. Using the same success criteria than SiteHound (radius = 2.0Å and precision = 15%), SOM-BSfinder achieved a success rate of 98%.</p>
<p>We also compared SOM-BSfinder to Q-SiteFinder [
<xref ref-type="bibr" rid="CR19">19</xref>
] and FTSite [
<xref ref-type="bibr" rid="CR21">21</xref>
,
<xref ref-type="bibr" rid="CR22">22</xref>
]. Q-SiteFinder is very similar to SiteHound, but used the GRID [
<xref ref-type="bibr" rid="CR16">16</xref>
] algorithm for grid calculation. On a set of 35 targets, with a radius fixed to 1.6Å and a precision threshold of 25%, it achieved 90% and 71% of success rates for the Top3 BSs and first BS, respectively. FTSite was tested on the same target set using the same parameter values [
<xref ref-type="bibr" rid="CR22">22</xref>
]. It achieved 97% and 80% of success rates, respectively. We tested SOM-BSfinder with these values and obtained 97% and 88% of success for the Top3 CCs and Top1 CC, respectively. Thus, SOM-BSfinder is either as good or better than the three methods used for comparison. One should note that these results were obtained on different datasets, except for FTSite and Q-SiteFinder.</p>
<p>A major difference between SOM-BSfinder and the other methods is the probe library: SiteHound used a carbon atom probe, Q-SiteFinder used a methyl probe and FTSite used 16 organic probes with an average size of 4.3 heavy atoms. FTSite achieved the best SR among these three methods. In contrast, SOM-BSfinder used 1500 fragment molecules as probes, and achieved an SR equal or superior to FTSite. This may be a direct result of the diversity of the probe library used. Moreover, SOM-BSfinder takes into account the size and shape of the fragments during docking, which is less meaningful when the probe accounts for less than 8 heavy atoms (FTSite).</p>
<p>The ABL1 target was among the 9 cases out of 102 where the AS was identified at the second CC (labeled (6’) in Figure
<xref rid="Fig4" ref-type="fig">4</xref>
) by SOM-BSfinder. Notably, the first CC corresponded to the BS2 previously defined (labeled (1’) in Figure
<xref rid="Fig4" ref-type="fig">4</xref>
). Interestingly, when the SOMs are visualized with the docking score projections (Figure
<xref rid="Fig4" ref-type="fig">4</xref>
), it becomes more intuitive to select the AS, thus inverting the ranking of the CCs. This shows that the ranking criterion by decreasing densities, a common way of identifying the AS [
<xref ref-type="bibr" rid="CR13">13</xref>
,
<xref ref-type="bibr" rid="CR73">73</xref>
] that performs remarkably well, can still be further refined. For example, some energy-based methods [
<xref ref-type="bibr" rid="CR19">19</xref>
,
<xref ref-type="bibr" rid="CR27">27</xref>
], rank the BSs by the cumulative energy of their probes, and incorporate the quality of the docking and the size of the cluster.</p>
<p>Use of chemical feature positions as input for the SOMs improved the characterization as well as the discrimination of the AS and the BS2. For both test targets HIV-RT and ABL1, the predicted AS fits the experimentally known one and is depicted by low U-values that reflect the homogeneity of the docking poses. The BS2 is characterized by a less distinct area in the SOM than the AS, with less favorable docking scores. Moreover, a larger discrimination of “active features” is found in the ASs, and the specificity is over 84%. This characteristic of the method could prove useful in predicting relevant substructures, and favoring hit discovery and optimization. It also readily provides a set of potentially active fragments for test in a drug design project.</p>
</sec>
<sec id="Sec20" sec-type="conclusion">
<title>Conclusions</title>
<p>The present work presents a new method for binding site identification called SOM-BSfinder. It is a probe-mapping method that uses docking of a compound library to map the protein target surface. Atomic coordinates of the docked molecules are clustered using a Self-Organizing Map algorithm to generate a 3D map that reflects preferential binding positions on the protein surface. These positions constitute consensual clusters that define the favored binding sites of the probes. The method was calibrated on two test targets to identify the best conditions for optimal performances. In a second phase, a benchmark was performed on 102 proteins using AutoDock vina for docking the Enamine Golden Fragments collection. SOM-BSfinder achieved 90% of successful detection when the first three consensual clusters are retained, and 82% when only the first cluster is considered. Compared to existing method, our method achieved either equal or superior success. The last part of this work consists in the use of chemical decomposition, using the circular Morgan fingerprints, of the probes molecules instead of an atomic decomposition. This lead to a better fit and descrimination of the active sites. Moreover, these results could also be used to predict chemical moieties relevant to bioactivity.</p>
<p>A further advantage of our method is its high flexibility. In our hands, the combination of AutoDock Vina and the Enamine Golden Fragments collection gave the best predictions. Nonetheless, similar pipelines could be implemented with other docking programs, fragment libraries and/or clustering algorithm to better exploit the user’s knowledge and expertise on the targeted protein. Similarly, features other than the Morgan fingerprints can be employed to describe the ligand chemistry.</p>
</sec>
</body>
<back>
<app-group>
<app id="App1">
<sec id="Sec21">
<title>Additional file</title>
<p>
<media position="anchor" xlink:href="12859_2015_518_MOESM1_ESM.pdf" id="MOESM1">
<label>Additional file 1</label>
<caption>
<p>
<bold>Additional file (SupplementaryInformations.pdf) contains figures and tables with their respective captions and descriptions.</bold>
</p>
</caption>
</media>
</p>
</sec>
</app>
</app-group>
<glossary>
<title>Abbreviations</title>
<def-list>
<def-item>
<term>3D</term>
<def>
<p>Three dimensional</p>
</def>
</def-item>
<def-item>
<term>AA</term>
<def>
<p>Amino acid</p>
</def>
</def-item>
<def-item>
<term>ADvina</term>
<def>
<p>AutoDock vina</p>
</def>
</def-item>
<def-item>
<term>AS</term>
<def>
<p>Active site</p>
</def>
</def-item>
<def-item>
<term>BS</term>
<def>
<p>Binding site</p>
</def>
</def-item>
<def-item>
<term>BS2</term>
<def>
<p>Secondary binding site</p>
</def>
</def-item>
<def-item>
<term>CC</term>
<def>
<p>Consensual cluster</p>
</def>
</def-item>
<def-item>
<term>DUD-E</term>
<def>
<p>Database of useful decoys enhanced</p>
</def>
</def-item>
<def-item>
<term>EGF</term>
<def>
<p>Enamine golden fragments</p>
</def>
</def-item>
<def-item>
<term>Hbond</term>
<def>
<p>Hydrogen bonds</p>
</def>
</def-item>
<def-item>
<term>PCA</term>
<def>
<p>Principal component analysis</p>
</def>
</def-item>
<def-item>
<term>PDB</term>
<def>
<p>Protein data bank</p>
</def>
</def-item>
<def-item>
<term>Se</term>
<def>
<p>Sensitivity</p>
</def>
</def-item>
<def-item>
<term>SOM</term>
<def>
<p>Self-organizing map</p>
</def>
</def-item>
<def-item>
<term>Sp</term>
<def>
<p>Specificity</p>
</def>
</def-item>
<def-item>
<term>SR</term>
<def>
<p>Success rate</p>
</def>
</def-item>
</def-list>
</glossary>
<fn-group>
<fn>
<p>
<bold>Competing interests</bold>
</p>
<p>The authors declare that they have no competing interests.</p>
</fn>
<fn>
<p>
<bold>Authors’ contributions</bold>
</p>
<p>MN, EHS, AB and GB conceived the research. ND, EHS and GB conceived and designed the 3D-SOM method. EHS prepared and docked the molecule libraries. ICC, EHS, AB and GB conceived the fingerprint based chemical descriptors. AB, EHS and ND performed the cavity identification and analysis. All authors analyzed and interpreted the data. EHS, TEM, IG, MN, AB and GB drafted the manuscript. All authors read and approved the final manuscript.</p>
</fn>
</fn-group>
<ack>
<title>Acknowledgements</title>
<p>MN acknowledges funding from the Investissement d’avenir bioinformatics programme (Grant Bip:Bip), the European Research Commission (Advanced Grant ERC-2011-StG 294809 BayCellS). IG and MN acknowledge funding from the Institut Pasteur PTR programme (grant PTR426) and IG from the Ministry of Higher Education and Research in Tunisia (LR00SP04 & LR11IPT04). EHS is a recipient of a UNESCO-l’Oreal international fellowship, and recieved support from the Institut Pasteur International Network. ICC is a fellow of the Paris-Pasteur International PhD Programme. ND is recipient of an AXA Research Fund PhD fellowship.</p>
<p>The SOM software is available on GitHub (
<ext-link ext-link-type="uri" xlink:href="https://github.com/bougui505/SOM/tree/SOM3D_dev">https://github.com/bougui505/SOM/tree/SOM3D_dev</ext-link>
).</p>
</ack>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Woodward</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Edelsbrunner</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design</article-title>
<source>Protein Sci.</source>
<year>1998</year>
<volume>7</volume>
<issue>9</issue>
<fpage>1884</fpage>
<lpage>97</lpage>
<pub-id pub-id-type="doi">10.1002/pro.5560070905</pub-id>
<pub-id pub-id-type="pmid">9761470</pub-id>
</element-citation>
</ref>
<ref id="CR2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>An</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Totrov</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Abagyan</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Pocketome via comprehensive identification and classification of ligand binding envelopes</article-title>
<source>Mol Cell Proteomics</source>
<year>2005</year>
<volume>4</volume>
<issue>6</issue>
<fpage>752</fpage>
<lpage>61</lpage>
<pub-id pub-id-type="doi">10.1074/mcp.M400159-MCP200</pub-id>
<pub-id pub-id-type="pmid">15757999</pub-id>
</element-citation>
</ref>
<ref id="CR3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Soga</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Shirai</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kobori</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hirayama</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Use of amino acid composition to predict ligand-binding sites</article-title>
<source>J Chem Inf Model.</source>
<year>2007</year>
<volume>47</volume>
<issue>2</issue>
<fpage>400</fpage>
<lpage>6</lpage>
<pub-id pub-id-type="doi">10.1021/ci6002202</pub-id>
<pub-id pub-id-type="pmid">17243757</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cheng</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Coleman</surname>
<given-names>RG</given-names>
</name>
<name>
<surname>Smyth</surname>
<given-names>KT</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Soulard</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Caffrey</surname>
<given-names>DR</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Structure-based maximal affinity model predicts small-molecule druggability</article-title>
<source>Nat Biol.</source>
<year>2007</year>
<volume>25</volume>
<issue>1</issue>
<fpage>71</fpage>
<lpage>5</lpage>
</element-citation>
</ref>
<ref id="CR5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Halgren</surname>
<given-names>TA</given-names>
</name>
</person-group>
<article-title>Identifying and characterizing binding sites and assessing druggability</article-title>
<source>J Chem Inf Model.</source>
<year>2009</year>
<volume>49</volume>
<issue>2</issue>
<fpage>377</fpage>
<lpage>89</lpage>
<pub-id pub-id-type="doi">10.1021/ci800324m</pub-id>
<pub-id pub-id-type="pmid">19434839</pub-id>
</element-citation>
</ref>
<ref id="CR6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>López</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Valencia</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Tress</surname>
<given-names>ML</given-names>
</name>
</person-group>
<article-title>firestar-prediction of functionally important residues using structural templates and alignment reliability</article-title>
<source>Nucleic Acids Res.</source>
<year>2007</year>
<volume>35</volume>
<issue>suppl 2</issue>
<fpage>573</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkm297</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Capra</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Predicting functionally important residues from sequence conservation</article-title>
<source>Bioinformatics</source>
<year>2007</year>
<volume>23</volume>
<issue>15</issue>
<fpage>1875</fpage>
<lpage>82</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btm270</pub-id>
<pub-id pub-id-type="pmid">17519246</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Capra</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Laskowski</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Thornton</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Funkhouser</surname>
<given-names>TA</given-names>
</name>
</person-group>
<article-title>Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3d structure</article-title>
<source>PLoS Comput Biol.</source>
<year>2009</year>
<volume>5</volume>
<issue>12</issue>
<fpage>1000585</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1000585</pub-id>
</element-citation>
</ref>
<ref id="CR9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mayrose</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Graur</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Ben-Tal</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Pupko</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Comparison of site-specific rate-inference methods for protein sequences: empirical bayesian methods are superior</article-title>
<source>Mol Biol Evol.</source>
<year>2004</year>
<volume>21</volume>
<issue>9</issue>
<fpage>1781</fpage>
<lpage>91</lpage>
<pub-id pub-id-type="doi">10.1093/molbev/msh194</pub-id>
<pub-id pub-id-type="pmid">15201400</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ghersi</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sanchez</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Beyond structural genomics: computational approaches for the identification of ligand binding sites in protein structures</article-title>
<source>J Struct Funct Genomics</source>
<year>2011</year>
<volume>12</volume>
<issue>2</issue>
<fpage>109</fpage>
<lpage>17</lpage>
<pub-id pub-id-type="doi">10.1007/s10969-011-9110-6</pub-id>
<pub-id pub-id-type="pmid">21537951</pub-id>
</element-citation>
</ref>
<ref id="CR11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Levitt</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Banaszak</surname>
<given-names>LJ</given-names>
</name>
</person-group>
<article-title>Pocket: a computer graphies method for identifying and displaying protein cavities and their surrounding amino acids</article-title>
<source>J Mol graphics</source>
<year>1992</year>
<volume>10</volume>
<issue>4</issue>
<fpage>229</fpage>
<lpage>34</lpage>
<pub-id pub-id-type="doi">10.1016/0263-7855(92)80074-N</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Laskowski</surname>
<given-names>RA</given-names>
</name>
</person-group>
<article-title>Surfnet: a program for visualizing molecular surfaces, cavities, and intermolecular interactions</article-title>
<source>J Mol Graphics</source>
<year>1995</year>
<volume>13</volume>
<issue>5</issue>
<fpage>323</fpage>
<lpage>30</lpage>
<pub-id pub-id-type="doi">10.1016/0263-7855(95)00073-9</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hendlich</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rippmann</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Barnickel</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Ligsite: automatic and efficient detection of potential small molecule-binding sites in proteins</article-title>
<source>J Mol Graphics Modell.</source>
<year>1997</year>
<volume>15</volume>
<issue>6</issue>
<fpage>359</fpage>
<lpage>63</lpage>
<pub-id pub-id-type="doi">10.1016/S1093-3263(98)00002-3</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dundas</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ouyang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Tseng</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Binkowski</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Turpaz</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Castp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues</article-title>
<source>Nucleic Acids Res.</source>
<year>2006</year>
<volume>34</volume>
<issue>suppl 2</issue>
<fpage>116</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkl282</pub-id>
</element-citation>
</ref>
<ref id="CR15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kawabata</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Detection of multiscale pockets on protein surfaces using mathematical morphology</article-title>
<source>Proteins</source>
<year>2010</year>
<volume>78</volume>
<issue>5</issue>
<fpage>1195</fpage>
<lpage>211</lpage>
<pub-id pub-id-type="doi">10.1002/prot.22639</pub-id>
<pub-id pub-id-type="pmid">19938154</pub-id>
</element-citation>
</ref>
<ref id="CR16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goodford</surname>
<given-names>PJ</given-names>
</name>
</person-group>
<article-title>A computational procedure for determining energetically favorable binding sites on biologically important macromolecules</article-title>
<source>J Med Chem.</source>
<year>1985</year>
<volume>28</volume>
<issue>7</issue>
<fpage>849</fpage>
<lpage>57</lpage>
<pub-id pub-id-type="doi">10.1021/jm00145a002</pub-id>
<pub-id pub-id-type="pmid">3892003</pub-id>
</element-citation>
</ref>
<ref id="CR17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ruppert</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Welch</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Jain</surname>
<given-names>AN</given-names>
</name>
</person-group>
<article-title>Automatic identification and representation of protein binding sites for molecular docking</article-title>
<source>Protein Sci.</source>
<year>1997</year>
<volume>6</volume>
<issue>3</issue>
<fpage>524</fpage>
<lpage>33</lpage>
<pub-id pub-id-type="doi">10.1002/pro.5560060302</pub-id>
<pub-id pub-id-type="pmid">9070435</pub-id>
</element-citation>
</ref>
<ref id="CR18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Harris</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Goodsell</surname>
<given-names>DS</given-names>
</name>
</person-group>
<article-title>Automated prediction of ligand-binding sites in proteins</article-title>
<source>Proteins: Struct Funct Bioinf.</source>
<year>2008</year>
<volume>70</volume>
<issue>4</issue>
<fpage>1506</fpage>
<lpage>17</lpage>
<pub-id pub-id-type="doi">10.1002/prot.21645</pub-id>
</element-citation>
</ref>
<ref id="CR19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Laurie</surname>
<given-names>AT</given-names>
</name>
<name>
<surname>Jackson</surname>
<given-names>RM</given-names>
</name>
</person-group>
<article-title>Q-sitefinder: an energy-based method for the prediction of protein–ligand binding sites</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<issue>9</issue>
<fpage>1908</fpage>
<lpage>16</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bti315</pub-id>
<pub-id pub-id-type="pmid">15701681</pub-id>
</element-citation>
</ref>
<ref id="CR20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Lakkaraju</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Raman</surname>
<given-names>EP</given-names>
</name>
<name>
<surname>MacKerell J</surname>
<given-names>AlexanderD</given-names>
</name>
</person-group>
<article-title>Site-identification by ligand competitive saturation (silcs) assisted pharmacophore modeling</article-title>
<source>J Comput-Aided Mol Des.</source>
<year>2014</year>
<volume>28</volume>
<issue>5</issue>
<fpage>491</fpage>
<lpage>507</lpage>
<pub-id pub-id-type="doi">10.1007/s10822-014-9728-0</pub-id>
<pub-id pub-id-type="pmid">24610239</pub-id>
</element-citation>
</ref>
<ref id="CR21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brenke</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Kozakov</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Chuang</surname>
<given-names>G-Y</given-names>
</name>
<name>
<surname>Beglov</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Hall</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Landon</surname>
<given-names>MR</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Fragment-based identification of druggable ‘hot spots’ of proteins using fourier domain correlation techniques</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>25</volume>
<issue>5</issue>
<fpage>621</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp036</pub-id>
<pub-id pub-id-type="pmid">19176554</pub-id>
</element-citation>
</ref>
<ref id="CR22">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ngan</surname>
<given-names>C-H</given-names>
</name>
<name>
<surname>Hall</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Zerbe</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Grove</surname>
<given-names>LE</given-names>
</name>
<name>
<surname>Kozakov</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Vajda</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Ftsite: high accuracy detection of ligand binding sites on unbound protein structures</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<issue>2</issue>
<fpage>286</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr651</pub-id>
<pub-id pub-id-type="pmid">22113084</pub-id>
</element-citation>
</ref>
<ref id="CR23">
<label>23</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Metapocket: a meta approach to improve protein ligand binding site prediction</article-title>
<source>OMICS J Integr Biol.</source>
<year>2009</year>
<volume>13</volume>
<issue>4</issue>
<fpage>325</fpage>
<lpage>30</lpage>
<pub-id pub-id-type="doi">10.1089/omi.2009.0045</pub-id>
</element-citation>
</ref>
<ref id="CR24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bowman</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Lerner</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Carlson</surname>
<given-names>HA</given-names>
</name>
</person-group>
<article-title>Protein flexibility and species specificity in structure-based drug discovery: dihydrofolate reductase as a test system</article-title>
<source>J Am Chem Soc.</source>
<year>2007</year>
<volume>129</volume>
<issue>12</issue>
<fpage>3634</fpage>
<lpage>40</lpage>
<pub-id pub-id-type="doi">10.1021/ja068256d</pub-id>
<pub-id pub-id-type="pmid">17335207</pub-id>
</element-citation>
</ref>
<ref id="CR25">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meagher</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Lerner</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Carlson</surname>
<given-names>HA</given-names>
</name>
</person-group>
<article-title>Refining the multiple protein structure pharmacophore method: consistency across three independent hiv-1 protease models</article-title>
<source>J Med Chem.</source>
<year>2006</year>
<volume>49</volume>
<issue>12</issue>
<fpage>3478</fpage>
<lpage>84</lpage>
<pub-id pub-id-type="doi">10.1021/jm050755m</pub-id>
<pub-id pub-id-type="pmid">16759090</pub-id>
</element-citation>
</ref>
<ref id="CR26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Glinca</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Klebe</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Cavities tell more than sequences: Exploring functional relationships of proteases via binding pockets</article-title>
<source>J Chem Inf Model.</source>
<year>2013</year>
<volume>53</volume>
<issue>8</issue>
<fpage>2082</fpage>
<lpage>92</lpage>
<pub-id pub-id-type="doi">10.1021/ci300550a</pub-id>
<pub-id pub-id-type="pmid">23834203</pub-id>
</element-citation>
</ref>
<ref id="CR27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ghersi</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sanchez</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Improving accuracy and efficiency of blind protein-ligand docking by focusing on predicted binding sites</article-title>
<source>Proteins: Struct Funct Bioinf.</source>
<year>2009</year>
<volume>74</volume>
<issue>2</issue>
<fpage>417</fpage>
<lpage>24</lpage>
<pub-id pub-id-type="doi">10.1002/prot.22154</pub-id>
</element-citation>
</ref>
<ref id="CR28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Morris</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Huey</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lindstrom</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Sanner</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Belew</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Goodsell</surname>
<given-names>DS</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Autodock4 and autodocktools4: Automated docking with selective receptor flexibility</article-title>
<source>J Comput Chem.</source>
<year>2009</year>
<volume>30</volume>
<issue>16</issue>
<fpage>2785</fpage>
<lpage>91</lpage>
<pub-id pub-id-type="doi">10.1002/jcc.21256</pub-id>
<pub-id pub-id-type="pmid">19399780</pub-id>
</element-citation>
</ref>
<ref id="CR29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kohonen</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Self-organized formation of topologically correct feature maps</article-title>
<source>Biol Cybernet.</source>
<year>1982</year>
<volume>43</volume>
<issue>1</issue>
<fpage>59</fpage>
<lpage>69</lpage>
<pub-id pub-id-type="doi">10.1007/BF00337288</pub-id>
</element-citation>
</ref>
<ref id="CR30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mahony</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hendrix</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Golden</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Rokhsar</surname>
<given-names>DS</given-names>
</name>
</person-group>
<article-title>Transcription factor binding site identification using the self-organizing map</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<issue>9</issue>
<fpage>1807</fpage>
<lpage>14</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bti256</pub-id>
<pub-id pub-id-type="pmid">15647296</pub-id>
</element-citation>
</ref>
<ref id="CR31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mahony</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Benos</surname>
<given-names>PV</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Golden</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Self-organizing neural networks to support the discovery of dna-binding motifs</article-title>
<source>Neural Networks</source>
<year>2006</year>
<volume>19</volume>
<issue>6</issue>
<fpage>950</fpage>
<lpage>62</lpage>
<pub-id pub-id-type="doi">10.1016/j.neunet.2006.05.023</pub-id>
<pub-id pub-id-type="pmid">16839740</pub-id>
</element-citation>
</ref>
<ref id="CR32">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hasegawa</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Funatsu</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>New description of protein-ligand interactions using a spherical self-organizing map</article-title>
<source>Bioorg Med Chem.</source>
<year>2012</year>
<volume>20</volume>
<issue>18</issue>
<fpage>5410</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="doi">10.1016/j.bmc.2012.03.041</pub-id>
<pub-id pub-id-type="pmid">22503362</pub-id>
</element-citation>
</ref>
<ref id="CR33">
<label>33</label>
<mixed-citation publication-type="other">Zupan J, Gasteiger J. Neural networks in chemistry and drug design: John Wiley & Sons, Inc.; 1999.</mixed-citation>
</ref>
<ref id="CR34">
<label>34</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roche</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Trube</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Zuegge</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Pflimlin</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Alanine</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Schneider</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>A virtual screening method for prediction of the herg potassium channel liability of compound libraries</article-title>
<source>ChemBioChem</source>
<year>2002</year>
<volume>3</volume>
<issue>5</issue>
<fpage>455</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1002/1439-7633(20020503)3:5<455::AID-CBIC455>3.0.CO;2-L</pub-id>
<pub-id pub-id-type="pmid">12007180</pub-id>
</element-citation>
</ref>
<ref id="CR35">
<label>35</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bouvier</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Evrard-Todeschi</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Girault</surname>
<given-names>J-P</given-names>
</name>
<name>
<surname>Bertho</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Automatic clustering of docking poses in virtual screening process using self-organizing map.</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<issue>1</issue>
<fpage>53</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp623</pub-id>
<pub-id pub-id-type="pmid">19910307</pub-id>
</element-citation>
</ref>
<ref id="CR36">
<label>36</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reker</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Rodrigues</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Schneider</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schneider</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus</article-title>
<source>Proc Nat Acad Sci.</source>
<year>2014</year>
<volume>111</volume>
<issue>11</issue>
<fpage>4067</fpage>
<lpage>72</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.1320001111</pub-id>
<pub-id pub-id-type="pmid">24591595</pub-id>
</element-citation>
</ref>
<ref id="CR37">
<label>37</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Digles</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Ecker</surname>
<given-names>GF</given-names>
</name>
</person-group>
<article-title>Self-organizing maps for in silico screening and data visualization</article-title>
<source>Mol Inf.</source>
<year>2011</year>
<volume>30</volume>
<issue>10</issue>
<fpage>838</fpage>
<lpage>46</lpage>
<pub-id pub-id-type="doi">10.1002/minf.201100082</pub-id>
</element-citation>
</ref>
<ref id="CR38">
<label>38</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bouvier</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Duclert-Savatier</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Desdouits</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Meziane-Cherif</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Blondel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Courvalin</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Functional motions modulating vana ligand binding unraveled by self-organizing maps</article-title>
<source>J Chem Inf Model.</source>
<year>2014</year>
<volume>54</volume>
<issue>1</issue>
<fpage>289</fpage>
<lpage>301</lpage>
<pub-id pub-id-type="doi">10.1021/ci400354b</pub-id>
<pub-id pub-id-type="pmid">24397493</pub-id>
</element-citation>
</ref>
<ref id="CR39">
<label>39</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Miri</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Bouvier</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Kettani</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mikou</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Wakrim</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Nilges</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Stabilization of the integrase-dna complex by mg2+ ions and prediction of key residues for binding hiv-1 integrase inhibitors</article-title>
<source>Proteins: Struct Funct Bioinf.</source>
<year>2014</year>
<volume>82</volume>
<issue>3</issue>
<fpage>466</fpage>
<lpage>78</lpage>
<pub-id pub-id-type="doi">10.1002/prot.24412</pub-id>
</element-citation>
</ref>
<ref id="CR40">
<label>40</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nivaskumar</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bouvier</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Campos</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nadeau</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Egelman</surname>
<given-names>EH</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Distinct docking and stabilization steps of the pseudopilus conformational transition path suggest rotational assembly of type iv pilus-like fibers</article-title>
<source>Structure</source>
<year>2014</year>
<volume>22</volume>
<issue>5</issue>
<fpage>685</fpage>
<lpage>96</lpage>
<pub-id pub-id-type="doi">10.1016/j.str.2014.03.001</pub-id>
<pub-id pub-id-type="pmid">24685147</pub-id>
</element-citation>
</ref>
<ref id="CR41">
<label>41</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spill</surname>
<given-names>YG</given-names>
</name>
<name>
<surname>Bouvier</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Nilges</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>A convective replica-exchange method for sampling new energy basins</article-title>
<source>J Comput Chem.</source>
<year>2013</year>
<volume>34</volume>
<issue>2</issue>
<fpage>132</fpage>
<lpage>40</lpage>
<pub-id pub-id-type="doi">10.1002/jcc.23113</pub-id>
<pub-id pub-id-type="pmid">22961200</pub-id>
</element-citation>
</ref>
<ref id="CR42">
<label>42</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mysinger</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Carchia</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Irwin</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Shoichet</surname>
<given-names>BK</given-names>
</name>
</person-group>
<article-title>Directory of useful decoys, enhanced (dud-e): better ligands and decoys for better benchmarking</article-title>
<source>J Med Chem.</source>
<year>2012</year>
<volume>55</volume>
<issue>14</issue>
<fpage>6582</fpage>
<lpage>94</lpage>
<pub-id pub-id-type="doi">10.1021/jm300687e</pub-id>
<pub-id pub-id-type="pmid">22716043</pub-id>
</element-citation>
</ref>
<ref id="CR43">
<label>43</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bursulaya</surname>
<given-names>BD</given-names>
</name>
<name>
<surname>Totrov</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Abagyan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Brooks Iii</surname>
<given-names>CL</given-names>
</name>
</person-group>
<article-title>Comparative study of several algorithms for flexible ligand docking</article-title>
<source>J Comput-Aided Mol Des.</source>
<year>2003</year>
<volume>17</volume>
<issue>11</issue>
<fpage>755</fpage>
<lpage>63</lpage>
<pub-id pub-id-type="doi">10.1023/B:JCAM.0000017496.76572.6f</pub-id>
<pub-id pub-id-type="pmid">15072435</pub-id>
</element-citation>
</ref>
<ref id="CR44">
<label>44</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sousa</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Fernandes</surname>
<given-names>PA</given-names>
</name>
<name>
<surname>Ramos</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<article-title>Protein–ligand docking: current status and future challenges</article-title>
<source>Proteins: Struct Funct Bioinf.</source>
<year>2006</year>
<volume>65</volume>
<issue>1</issue>
<fpage>15</fpage>
<lpage>26</lpage>
<pub-id pub-id-type="doi">10.1002/prot.21082</pub-id>
</element-citation>
</ref>
<ref id="CR45">
<label>45</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Warren</surname>
<given-names>GL</given-names>
</name>
<name>
<surname>Andrews</surname>
<given-names>CW</given-names>
</name>
<name>
<surname>Capelli</surname>
<given-names>A-M</given-names>
</name>
<name>
<surname>Clarke</surname>
<given-names>B</given-names>
</name>
<name>
<surname>LaLonde</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lambert</surname>
<given-names>MH</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A critical assessment of docking programs and scoring functions</article-title>
<source>J Med Chem.</source>
<year>2006</year>
<volume>49</volume>
<issue>20</issue>
<fpage>5912</fpage>
<lpage>31</lpage>
<pub-id pub-id-type="doi">10.1021/jm050362n</pub-id>
<pub-id pub-id-type="pmid">17004707</pub-id>
</element-citation>
</ref>
<ref id="CR46">
<label>46</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moitessier</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Englebienne</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lawandi</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Corbeil</surname>
<given-names>CR</given-names>
</name>
</person-group>
<article-title>Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go</article-title>
<source>Br J Pharmacol.</source>
<year>2008</year>
<volume>153</volume>
<issue>S1</issue>
<fpage>7</fpage>
<lpage>26</lpage>
<pub-id pub-id-type="doi">10.1038/sj.bjp.0707515</pub-id>
</element-citation>
</ref>
<ref id="CR47">
<label>47</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Plewczynski</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Łaźniewski</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Augustyniak</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ginalski</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Can we trust docking results? evaluation of seven commonly used programs on pdbbind database</article-title>
<source>J Comput Chem.</source>
<year>2011</year>
<volume>32</volume>
<issue>4</issue>
<fpage>742</fpage>
<lpage>55</lpage>
<pub-id pub-id-type="doi">10.1002/jcc.21643</pub-id>
<pub-id pub-id-type="pmid">20812323</pub-id>
</element-citation>
</ref>
<ref id="CR48">
<label>48</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ewing</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Makino</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Skillman</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Kuntz</surname>
<given-names>ID</given-names>
</name>
</person-group>
<article-title>Dock 4.0: search strategies for automated molecular docking of flexible molecule databases</article-title>
<source>J Comput Aided Mol Des.</source>
<year>2001</year>
<volume>15</volume>
<issue>5</issue>
<fpage>411</fpage>
<lpage>28</lpage>
<pub-id pub-id-type="doi">10.1023/A:1011115820450</pub-id>
<pub-id pub-id-type="pmid">11394736</pub-id>
</element-citation>
</ref>
<ref id="CR49">
<label>49</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Trott</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>AJ</given-names>
</name>
</person-group>
<article-title>Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading</article-title>
<source>J Comput Chem.</source>
<year>2010</year>
<volume>31</volume>
<issue>2</issue>
<fpage>455</fpage>
<lpage>61</lpage>
<pub-id pub-id-type="pmid">19499576</pub-id>
</element-citation>
</ref>
<ref id="CR50">
<label>50</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Glem</surname>
<given-names>RC</given-names>
</name>
<name>
<surname>Bender</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Arnby</surname>
<given-names>CH</given-names>
</name>
<name>
<surname>Carlsson</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Boyer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Circular fingerprints: Flexible molecular descriptors with applications from physical chemistry to adme</article-title>
<source>IDrugs: Investigational Drugs J.</source>
<year>2006</year>
<volume>9</volume>
<issue>3</issue>
<fpage>199</fpage>
<lpage>204</lpage>
</element-citation>
</ref>
<ref id="CR51">
<label>51</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rogers</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Hahn</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Extended-connectivity fingerprints</article-title>
<source>J Chem Inf Model.</source>
<year>2010</year>
<volume>50</volume>
<issue>5</issue>
<fpage>742</fpage>
<lpage>54</lpage>
<pub-id pub-id-type="doi">10.1021/ci100050t</pub-id>
<pub-id pub-id-type="pmid">20426451</pub-id>
</element-citation>
</ref>
<ref id="CR52">
<label>52</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bender</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Jenkins</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Scheiber</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sukuru</surname>
<given-names>SCK</given-names>
</name>
<name>
<surname>Glick</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Davies</surname>
<given-names>JW</given-names>
</name>
</person-group>
<article-title>How similar are similarity searching methods? a principal component analysis of molecular descriptor space</article-title>
<source>J Chem Inf Model.</source>
<year>2009</year>
<volume>49</volume>
<issue>1</issue>
<fpage>108</fpage>
<lpage>19</lpage>
<pub-id pub-id-type="doi">10.1021/ci800249s</pub-id>
<pub-id pub-id-type="pmid">19123924</pub-id>
</element-citation>
</ref>
<ref id="CR53">
<label>53</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>van Westen</surname>
<given-names>GJP</given-names>
</name>
<name>
<surname>van den Hoven</surname>
<given-names>OO</given-names>
</name>
<name>
<surname>van der Pijl</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Mulder-Krieger</surname>
<given-names>T</given-names>
</name>
<name>
<surname>de Vries</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Wegner</surname>
<given-names>JK</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Identifying novel adenosine receptor ligands by simultaneous proteochemometric modeling of rat and human bioactivity data</article-title>
<source>J Med Chem.</source>
<year>2012</year>
<volume>55</volume>
<issue>16</issue>
<fpage>7010</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="doi">10.1021/jm3003069</pub-id>
<pub-id pub-id-type="pmid">22827545</pub-id>
</element-citation>
</ref>
<ref id="CR54">
<label>54</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cortes-Ciriano</surname>
<given-names>I</given-names>
</name>
<name>
<surname>van Westen</surname>
<given-names>GJ</given-names>
</name>
<name>
<surname>Lenselink</surname>
<given-names>EB</given-names>
</name>
<name>
<surname>Murrell</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Bender</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Malliavin</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Proteochemometric modeling in a bayesian framework</article-title>
<source>J Cheminformatics</source>
<year>2014</year>
<volume>6</volume>
<issue>1</issue>
<fpage>35</fpage>
<pub-id pub-id-type="doi">10.1186/1758-2946-6-35</pub-id>
</element-citation>
</ref>
<ref id="CR55">
<label>55</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Shoichet</surname>
<given-names>BK</given-names>
</name>
<name>
<surname>Irwin</surname>
<given-names>JJ</given-names>
</name>
</person-group>
<article-title>Benchmarking sets for molecular docking</article-title>
<source>J Med Chem.</source>
<year>2006</year>
<volume>49</volume>
<issue>23</issue>
<fpage>6789</fpage>
<lpage>801</lpage>
<pub-id pub-id-type="doi">10.1021/jm0608356</pub-id>
<pub-id pub-id-type="pmid">17154509</pub-id>
</element-citation>
</ref>
<ref id="CR56">
<label>56</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sarafianos</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>Marchand</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Das</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Himmel</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Parniak</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>SH</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Structure and function of hiv-1 reverse transcriptase: molecular mechanisms of polymerization and inhibition</article-title>
<source>J Mol Biol.</source>
<year>2009</year>
<volume>385</volume>
<issue>3</issue>
<fpage>693</fpage>
<lpage>713</lpage>
<pub-id pub-id-type="doi">10.1016/j.jmb.2008.10.071</pub-id>
<pub-id pub-id-type="pmid">19022262</pub-id>
</element-citation>
</ref>
<ref id="CR57">
<label>57</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mitchell</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Son</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>IY</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>C-K</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>HS</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>H</given-names>
</name>
<etal></etal>
</person-group>
<article-title>N1-heterocyclic pyrimidinediones as non-nucleoside inhibitors of hiv-1 reverse transcriptase</article-title>
<source>Bioorg Med Chem Lett.</source>
<year>2010</year>
<volume>20</volume>
<issue>5</issue>
<fpage>1585</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="doi">10.1016/j.bmcl.2010.01.086</pub-id>
<pub-id pub-id-type="pmid">20137928</pub-id>
</element-citation>
</ref>
<ref id="CR58">
<label>58</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cowan-Jacob</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Fendrich</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Floersheimer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Furet</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Liebetanz</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rummel</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Structural biology contributions to the discovery of drugs to treat chronic myelogenous leukaemia</article-title>
<source>Acta Crystallogr Sect D: Biol Crystallogr.</source>
<year>2006</year>
<volume>63</volume>
<issue>1</issue>
<fpage>80</fpage>
<lpage>93</lpage>
<pub-id pub-id-type="doi">10.1107/S0907444906047287</pub-id>
<pub-id pub-id-type="pmid">17164530</pub-id>
</element-citation>
</ref>
<ref id="CR59">
<label>59</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Congreve</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Carr</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Murray</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jhoti</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>A ’rule of three’ for fragment-based lead discovery?</article-title>
<source>Drug Discov Today</source>
<year>2003</year>
<volume>8</volume>
<issue>19</issue>
<fpage>876</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1016/S1359-6446(03)02831-9</pub-id>
</element-citation>
</ref>
<ref id="CR60">
<label>60</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Richards</surname>
<given-names>FM</given-names>
</name>
</person-group>
<article-title>The interpretation of protein structures: estimation of static accessibility</article-title>
<source>J Mol Biol.</source>
<year>1971</year>
<volume>55</volume>
<issue>3</issue>
<fpage>379</fpage>
<lpage>400</lpage>
<pub-id pub-id-type="doi">10.1016/0022-2836(71)90324-X</pub-id>
<pub-id pub-id-type="pmid">5551392</pub-id>
</element-citation>
</ref>
<ref id="CR61">
<label>61</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Desdouits</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Nilges</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Blondel</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Principal component analysis reveals correlation of cavities evolution and functional motions in proteins</article-title>
<source>J Mol Graphics Modell.</source>
<year>2015</year>
<volume>55</volume>
<fpage>13</fpage>
<lpage>24</lpage>
<pub-id pub-id-type="doi">10.1016/j.jmgm.2014.10.011</pub-id>
</element-citation>
</ref>
<ref id="CR62">
<label>62</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pettersen</surname>
<given-names>EF</given-names>
</name>
<name>
<surname>Goddard</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>CC</given-names>
</name>
<name>
<surname>Couch</surname>
<given-names>GS</given-names>
</name>
<name>
<surname>Greenblatt</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Meng</surname>
<given-names>EC</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Ucsf chimera–a visualization system for exploratory research and analysis</article-title>
<source>J Comput Chem.</source>
<year>2004</year>
<volume>25</volume>
<issue>13</issue>
<fpage>1605</fpage>
<lpage>12</lpage>
<pub-id pub-id-type="doi">10.1002/jcc.20084</pub-id>
<pub-id pub-id-type="pmid">15264254</pub-id>
</element-citation>
</ref>
<ref id="CR63">
<label>63</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pedregosa</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Varoquaux</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Gramfort</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Michel</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Thirion</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Grisel</surname>
<given-names>O</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Scikit-learn: Machine learning in Python</article-title>
<source>J Mach Learn Res.</source>
<year>2011</year>
<volume>12</volume>
<fpage>2825</fpage>
<lpage>30</lpage>
</element-citation>
</ref>
<ref id="CR64">
<label>64</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schwarz</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Estimating the dimension of a model</article-title>
<source>Ann Stat.</source>
<year>1978</year>
<volume>6</volume>
<issue>2</issue>
<fpage>461</fpage>
<lpage>4</lpage>
<pub-id pub-id-type="doi">10.1214/aos/1176344136</pub-id>
</element-citation>
</ref>
<ref id="CR65">
<label>65</label>
<mixed-citation publication-type="other">Landrum G. RDKit: Open-source Cheminformatics. http://www.rdkit.org.</mixed-citation>
</ref>
<ref id="CR66">
<label>66</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bauman</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Patel</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Dharia</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Fromer</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Ahmed</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Frenkel</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Detecting allosteric sites of hiv-1 reverse transcriptase by x-ray crystallographic fragment screening</article-title>
<source>J Med Chem.</source>
<year>2013</year>
<volume>56</volume>
<issue>7</issue>
<fpage>2738</fpage>
<lpage>46</lpage>
<pub-id pub-id-type="doi">10.1021/jm301271j</pub-id>
<pub-id pub-id-type="pmid">23342998</pub-id>
</element-citation>
</ref>
<ref id="CR67">
<label>67</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schindler</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bornmann</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Pellicena</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>WT</given-names>
</name>
<name>
<surname>Clarkson</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kuriyan</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Structural mechanism for sti-571 inhibition of abelson tyrosine kinase</article-title>
<source>Science</source>
<year>2000</year>
<volume>289</volume>
<issue>5486</issue>
<fpage>1938</fpage>
<lpage>42</lpage>
<pub-id pub-id-type="doi">10.1126/science.289.5486.1938</pub-id>
<pub-id pub-id-type="pmid">10988075</pub-id>
</element-citation>
</ref>
<ref id="CR68">
<label>68</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dennis</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kortvelyesi</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Vajda</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Computational mapping identifies the binding sites of organic solvents on proteins</article-title>
<source>Proc Nat Acad Sci.</source>
<year>2002</year>
<volume>99</volume>
<issue>7</issue>
<fpage>4290</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.062398499</pub-id>
<pub-id pub-id-type="pmid">11904374</pub-id>
</element-citation>
</ref>
<ref id="CR69">
<label>69</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kortvelyesi</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Silberstein</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Dennis</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Vajda</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Improved mapping of protein binding sites</article-title>
<source>J Comput-Aided Mol Des.</source>
<year>2003</year>
<volume>17</volume>
<issue>2-4</issue>
<fpage>173</fpage>
<lpage>86</lpage>
<pub-id pub-id-type="doi">10.1023/A:1025369923311</pub-id>
<pub-id pub-id-type="pmid">13677484</pub-id>
</element-citation>
</ref>
<ref id="CR70">
<label>70</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnson</surname>
<given-names>LN</given-names>
</name>
<name>
<surname>Noble</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Owen</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<article-title>Active and inactive protein kinases: structural basis for regulation</article-title>
<source>Cell</source>
<year>1996</year>
<volume>85</volume>
<issue>2</issue>
<fpage>149</fpage>
<lpage>58</lpage>
<pub-id pub-id-type="doi">10.1016/S0092-8674(00)81092-2</pub-id>
<pub-id pub-id-type="pmid">8612268</pub-id>
</element-citation>
</ref>
<ref id="CR71">
<label>71</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Morris</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Goodsell</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Halliday</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Huey</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hart</surname>
<given-names>WE</given-names>
</name>
<name>
<surname>Belew</surname>
<given-names>RK</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Automated docking using a lamarckian genetic algorithm and an empirical binding free energy function</article-title>
<source>J Comput Chem.</source>
<year>1998</year>
<volume>19</volume>
<issue>14</issue>
<fpage>1639</fpage>
<lpage>62</lpage>
<pub-id pub-id-type="doi">10.1002/(SICI)1096-987X(19981115)19:14<1639::AID-JCC10>3.0.CO;2-B</pub-id>
</element-citation>
</ref>
<ref id="CR72">
<label>72</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kuntz</surname>
<given-names>ID</given-names>
</name>
<name>
<surname>Blaney</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Oatley</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Langridge</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ferrin</surname>
<given-names>TE</given-names>
</name>
</person-group>
<article-title>A geometric approach to macromolecule-ligand interactions</article-title>
<source>J Mol Biol.</source>
<year>1982</year>
<volume>161</volume>
<issue>2</issue>
<fpage>269</fpage>
<lpage>88</lpage>
<pub-id pub-id-type="doi">10.1016/0022-2836(82)90153-X</pub-id>
<pub-id pub-id-type="pmid">7154081</pub-id>
</element-citation>
</ref>
<ref id="CR73">
<label>73</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Laskowski</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Luscombe</surname>
<given-names>NM</given-names>
</name>
<name>
<surname>Swindells</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Thornton</surname>
<given-names>JM</given-names>
</name>
</person-group>
<article-title>Protein clefts in molecular recognition and function</article-title>
<source>Protein Sci: Publ Protein Soc.</source>
<year>1996</year>
<volume>5</volume>
<issue>12</issue>
<fpage>2438</fpage>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sante/explor/MaghrebDataLibMedV2/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000029 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000029 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sante
   |area=    MaghrebDataLibMedV2
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4381396
   |texte=   Identification of binding sites and favorable ligand binding moieties by virtual screening and self-organizing map analysis
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:25888251" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MaghrebDataLibMedV2 

Wicri

This area was generated with Dilib version V0.6.38.
Data generation: Wed Jun 30 18:27:05 2021. Site generation: Wed Jun 30 18:34:21 2021