Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Current Challenges in Development of a Database of Three-Dimensional Chemical Structures

Identifieur interne : 000026 ( Pmc/Checkpoint ); précédent : 000025; suivant : 000027

Current Challenges in Development of a Database of Three-Dimensional Chemical Structures

Auteurs : Miki H. Maeda [Japon]

Source :

RBID : PMC:4443773

Abstract

We are developing a database named 3DMET, a three-dimensional structure database of natural metabolites. There are two major impediments to the creation of 3D chemical structures from a set of planar structure drawings: the limited accuracy of computer programs and insufficient human resources for manual curation. We have tested some 2D–3D converters to convert 2D structure files from external databases. These automatic conversion processes yielded an excessive number of improper conversions. To ascertain the quality of the conversions, we compared IUPAC Chemical Identifier and canonical SMILES notations before and after conversion. Structures whose notations correspond to each other were regarded as a correct conversion in our present work. We found that chiral inversion is the most serious factor during the improper conversion. In the current stage of our database construction, published books or articles have been resources for additions to our database. Chemicals are usually drawn as pictures on the paper. To save human resources, an optical structure reader was introduced. The program was quite useful but some particular errors were observed during our operation. We hope our trials for producing correct 3D structures will help other developers of chemical programs and curators of chemical databases.


Url:
DOI: 10.3389/fbioe.2015.00066
PubMed: 26075200
PubMed Central: 4443773


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4443773

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Current Challenges in Development of a Database of Three-Dimensional Chemical Structures</title>
<author>
<name sortKey="Maeda, Miki H" sort="Maeda, Miki H" uniqKey="Maeda M" first="Miki H." last="Maeda">Miki H. Maeda</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1">
<institution>Biomolecular Research Unit, National Institute of Agrobiological Sciences</institution>
,
<addr-line>Tsukuba</addr-line>
,
<country>Japan</country>
</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26075200</idno>
<idno type="pmc">4443773</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4443773</idno>
<idno type="RBID">PMC:4443773</idno>
<idno type="doi">10.3389/fbioe.2015.00066</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000006</idno>
<idno type="wicri:Area/Pmc/Curation">000006</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000026</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Current Challenges in Development of a Database of Three-Dimensional Chemical Structures</title>
<author>
<name sortKey="Maeda, Miki H" sort="Maeda, Miki H" uniqKey="Maeda M" first="Miki H." last="Maeda">Miki H. Maeda</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1">
<institution>Biomolecular Research Unit, National Institute of Agrobiological Sciences</institution>
,
<addr-line>Tsukuba</addr-line>
,
<country>Japan</country>
</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea># see nlm:aff country strict</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Frontiers in Bioengineering and Biotechnology</title>
<idno type="eISSN">2296-4185</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>We are developing a database named 3DMET, a three-dimensional structure database of natural metabolites. There are two major impediments to the creation of 3D chemical structures from a set of planar structure drawings: the limited accuracy of computer programs and insufficient human resources for manual curation. We have tested some 2D–3D converters to convert 2D structure files from external databases. These automatic conversion processes yielded an excessive number of improper conversions. To ascertain the quality of the conversions, we compared IUPAC Chemical Identifier and canonical SMILES notations before and after conversion. Structures whose notations correspond to each other were regarded as a correct conversion in our present work. We found that chiral inversion is the most serious factor during the improper conversion. In the current stage of our database construction, published books or articles have been resources for additions to our database. Chemicals are usually drawn as pictures on the paper. To save human resources, an optical structure reader was introduced. The program was quite useful but some particular errors were observed during our operation. We hope our trials for producing correct 3D structures will help other developers of chemical programs and curators of chemical databases.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Bostrom, J" uniqKey="Bostrom J">J. Bostrom</name>
</author>
<author>
<name sortKey="Greenwood, J R" uniqKey="Greenwood J">J. R. Greenwood</name>
</author>
<author>
<name sortKey="Gottfries, J" uniqKey="Gottfries J">J. Gottfries</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Coles, S J" uniqKey="Coles S">S. J. Coles</name>
</author>
<author>
<name sortKey="Day, N E" uniqKey="Day N">N. E. Day</name>
</author>
<author>
<name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P. Murray-Rust</name>
</author>
<author>
<name sortKey="Rzepa, H S" uniqKey="Rzepa H">H. S. Rzepa</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y. Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dewar, M J S" uniqKey="Dewar M">M. J. S. Dewar</name>
</author>
<author>
<name sortKey="Thiel, W" uniqKey="Thiel W">W. Thiel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dewar, M J S" uniqKey="Dewar M">M. J. S. Dewar</name>
</author>
<author>
<name sortKey="Zoebisch, E G" uniqKey="Zoebisch E">E. G. Zoebisch</name>
</author>
<author>
<name sortKey="Healy, E F" uniqKey="Healy E">E. F. Healy</name>
</author>
<author>
<name sortKey="Stewart, J P" uniqKey="Stewart J">J. P. Stewart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Filippov, I V" uniqKey="Filippov I">I. V. Filippov</name>
</author>
<author>
<name sortKey="Nicklaus, M C" uniqKey="Nicklaus M">M. C. Nicklaus</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gasteiger, J" uniqKey="Gasteiger J">J. Gasteiger</name>
</author>
<author>
<name sortKey="Rudolph, C" uniqKey="Rudolph C">C. Rudolph</name>
</author>
<author>
<name sortKey="Sadowski, J" uniqKey="Sadowski J">J. Sadowski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goto, S" uniqKey="Goto S">S. Goto</name>
</author>
<author>
<name sortKey="Okuno, Y" uniqKey="Okuno Y">Y. Okuno</name>
</author>
<author>
<name sortKey="Hattori, M" uniqKey="Hattori M">M. Hattori</name>
</author>
<author>
<name sortKey="Nishioka, T" uniqKey="Nishioka T">T. Nishioka</name>
</author>
<author>
<name sortKey="Kanehisa, M" uniqKey="Kanehisa M">M. Kanehisa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Halgren, T A" uniqKey="Halgren T">T. A. Halgren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ibison, P" uniqKey="Ibison P">P. Ibison</name>
</author>
<author>
<name sortKey="Jacquot, M" uniqKey="Jacquot M">M. Jacquot</name>
</author>
<author>
<name sortKey="Kam, F" uniqKey="Kam F">F. Kam</name>
</author>
<author>
<name sortKey="Neville, A G" uniqKey="Neville A">A. G. Neville</name>
</author>
<author>
<name sortKey="Simpson, R W" uniqKey="Simpson R">R. W. Simpson</name>
</author>
<author>
<name sortKey="Tonnelier, C" uniqKey="Tonnelier C">C. Tonnelier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Irwin, J J" uniqKey="Irwin J">J. J. Irwin</name>
</author>
<author>
<name sortKey="Sterling, T" uniqKey="Sterling T">T. Sterling</name>
</author>
<author>
<name sortKey="Mysinger, M M" uniqKey="Mysinger M">M. M. Mysinger</name>
</author>
<author>
<name sortKey="Bolstad, E S" uniqKey="Bolstad E">E. S. Bolstad</name>
</author>
<author>
<name sortKey="Coleman, R G" uniqKey="Coleman R">R. G. Coleman</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maeda, M H" uniqKey="Maeda M">M. H. Maeda</name>
</author>
<author>
<name sortKey="Kondo, K" uniqKey="Kondo K">K. Kondo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcdaniel, J R" uniqKey="Mcdaniel J">J. R. McDaniel</name>
</author>
<author>
<name sortKey="Balmuth, J R" uniqKey="Balmuth J">J. R. Balmuth</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Park, J" uniqKey="Park J">J. Park</name>
</author>
<author>
<name sortKey="Rosania, G R" uniqKey="Rosania G">G. R. Rosania</name>
</author>
<author>
<name sortKey="Shedden, K A" uniqKey="Shedden K">K. A. Shedden</name>
</author>
<author>
<name sortKey="Nguyen, M" uniqKey="Nguyen M">M. Nguyen</name>
</author>
<author>
<name sortKey="Lyu, N" uniqKey="Lyu N">N. Lyu</name>
</author>
<author>
<name sortKey="Saitou, K" uniqKey="Saitou K">K. Saitou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pearlman, R S" uniqKey="Pearlman R">R. S. Pearlman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rarey, M" uniqKey="Rarey M">M. Rarey</name>
</author>
<author>
<name sortKey="Kramer, B" uniqKey="Kramer B">B. Kramer</name>
</author>
<author>
<name sortKey="Lengauer, T" uniqKey="Lengauer T">T. Lengauer</name>
</author>
<author>
<name sortKey="Klebe, G" uniqKey="Klebe G">G. Klebe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sadowski, J" uniqKey="Sadowski J">J. Sadowski</name>
</author>
<author>
<name sortKey="Gasteiger, J" uniqKey="Gasteiger J">J. Gasteiger</name>
</author>
<author>
<name sortKey="Klebe, G" uniqKey="Klebe G">G. Klebe</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stewart, J J P" uniqKey="Stewart J">J. J. P. Stewart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stewart, J J P" uniqKey="Stewart J">J. J. P. Stewart</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Y" uniqKey="Wang Y">Y. Wang</name>
</author>
<author>
<name sortKey="Xiao, J" uniqKey="Xiao J">J. Xiao</name>
</author>
<author>
<name sortKey="Suzek, T O" uniqKey="Suzek T">T. O. Suzek</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J. Zhang</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J. Wang</name>
</author>
<author>
<name sortKey="Bryant, S H" uniqKey="Bryant S">S. H. Bryant</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weininger, D" uniqKey="Weininger D">D. Weininger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weininger, D" uniqKey="Weininger D">D. Weininger</name>
</author>
<author>
<name sortKey="Weininger, A" uniqKey="Weininger A">A. Weininger</name>
</author>
<author>
<name sortKey="Weininger, J L" uniqKey="Weininger J">J. L. Weininger</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Front Bioeng Biotechnol</journal-id>
<journal-id journal-id-type="iso-abbrev">Front Bioeng Biotechnol</journal-id>
<journal-id journal-id-type="publisher-id">Front. Bioeng. Biotechnol.</journal-id>
<journal-title-group>
<journal-title>Frontiers in Bioengineering and Biotechnology</journal-title>
</journal-title-group>
<issn pub-type="epub">2296-4185</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26075200</article-id>
<article-id pub-id-type="pmc">4443773</article-id>
<article-id pub-id-type="doi">10.3389/fbioe.2015.00066</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Bioengineering and Biotechnology</subject>
<subj-group>
<subject>Technology Report</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Current Challenges in Development of a Database of Three-Dimensional Chemical Structures</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Maeda</surname>
<given-names>Miki H.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="cor1">*</xref>
<uri xlink:type="simple" xlink:href="http://frontiersin.org/people/u/202775"></uri>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Biomolecular Research Unit, National Institute of Agrobiological Sciences</institution>
,
<addr-line>Tsukuba</addr-line>
,
<country>Japan</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Masanori Arita, National Institute of Genetics, Japan</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Gaurav Sablok, Istituto Agrario San Michele, Italy; Kensuke Nakamura, Maebashi Institute of Technology, Japan</p>
</fn>
<corresp content-type="corresp" id="cor1">*Correspondence: Miki H. Maeda, Biomolecular Research Unit, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan,
<email>mmaeda@nias.affrc.go.jp</email>
</corresp>
<fn fn-type="other" id="fn001">
<p>Specialty section: This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Bioengineering and Biotechnology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>5</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>3</volume>
<elocation-id>66</elocation-id>
<history>
<date date-type="received">
<day>07</day>
<month>1</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>4</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2015 Maeda.</copyright-statement>
<copyright-year>2015</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</license-p>
</license>
</permissions>
<abstract>
<p>We are developing a database named 3DMET, a three-dimensional structure database of natural metabolites. There are two major impediments to the creation of 3D chemical structures from a set of planar structure drawings: the limited accuracy of computer programs and insufficient human resources for manual curation. We have tested some 2D–3D converters to convert 2D structure files from external databases. These automatic conversion processes yielded an excessive number of improper conversions. To ascertain the quality of the conversions, we compared IUPAC Chemical Identifier and canonical SMILES notations before and after conversion. Structures whose notations correspond to each other were regarded as a correct conversion in our present work. We found that chiral inversion is the most serious factor during the improper conversion. In the current stage of our database construction, published books or articles have been resources for additions to our database. Chemicals are usually drawn as pictures on the paper. To save human resources, an optical structure reader was introduced. The program was quite useful but some particular errors were observed during our operation. We hope our trials for producing correct 3D structures will help other developers of chemical programs and curators of chemical databases.</p>
</abstract>
<kwd-group>
<kwd>2D–3D conversion</kwd>
<kwd>3DMET</kwd>
<kwd>CLiDE</kwd>
<kwd>InChI</kwd>
<kwd>canonical SMILES</kwd>
<kwd>chemical database</kwd>
<kwd>natural products</kwd>
</kwd-group>
<funding-group>
<award-group>
<funding-source id="cn001">JSPS KAKENHI (Grant-in-Aid for Publication of Scientific Research Results)</funding-source>
<award-id rid="cn001">218065</award-id>
<award-id rid="cn001">238062</award-id>
<award-id rid="cn001">248057</award-id>
<award-id rid="cn001">2578008</award-id>
</award-group>
</funding-group>
<counts>
<fig-count count="4"></fig-count>
<table-count count="4"></table-count>
<equation-count count="0"></equation-count>
<ref-count count="29"></ref-count>
<page-count count="9"></page-count>
<word-count count="6133"></word-count>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="introduction" id="S1">
<title>Introduction</title>
<p>A database called 3DMET, a three-dimensional structure database of natural metabolites, is being developed in our laboratory. In the earliest days, 3D structures were converted from two-dimensional (2D) structures of relevant external databases. Currently, new entries to 3DMET are collected from chemical structures from print form. The bottleneck in our project is curation. Because of the shortage of good curators, several trials for automatic operation were evaluated. All trials utilizing known programs indicated that their accuracy is insufficient.</p>
<p>Figure
<xref ref-type="fig" rid="F1">1</xref>
shows our current workflow of 3D-structure construction. During automatic processing, the following three steps are involved: conversion from picture to Cartesian atomic coordinates, 2D–3D conversion, and energy minimization. For conversion from picture to Cartesian coordinates, several programs such as Kekule (McDaniel and Balmuth,
<xref rid="B16" ref-type="bibr">1992</xref>
), CLiDE (Ibison et al.,
<xref rid="B12" ref-type="bibr">1993</xref>
; CLiDE,
<xref rid="B3" ref-type="bibr">2014</xref>
), OSRA (Filippov and Nicklaus,
<xref rid="B8" ref-type="bibr">2009</xref>
), and ChemReader (Park et al.,
<xref rid="B19" ref-type="bibr">2009</xref>
) have been reported. Kekule is one of the earliest programs for optical chemical structure recognition, but it no longer seems to be available. We employed CLiDE supplied by Keymodule Ltd., because it is novel program easily obtainable and easy to handle when we started extracting chemical 2D structures from the paper literature. For 2D–3D conversion, 3D-generators such as CONCORD (Pearlman,
<xref rid="B20" ref-type="bibr">1987</xref>
), CORINA (Gasteiger et al.,
<xref rid="B9" ref-type="bibr">1990</xref>
; CORINA,
<xref rid="B5" ref-type="bibr">2015</xref>
), and OMEGA (Bostrom et al.,
<xref rid="B1" ref-type="bibr">2003</xref>
; OMEGA,
<xref rid="B18" ref-type="bibr">2015</xref>
) were developed. They are widely used programs in drug discovery. We employed CONCORD in the earliest days of our database development. However, we do not use any 3D-generators now because our sources changed from 2D-files to paper literature. Energy minimization programs were necessary in the both cases. We have usually employed MOE (Molecular Operating Environment,
<xref rid="B17" ref-type="bibr">2015</xref>
) as supplied by Computer Chemistry Group to develop 3D structures using the MMFF94 force field (Halgren,
<xref rid="B11" ref-type="bibr">1996</xref>
).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>Operation flow to develop 3D structures</bold>
. Arrows of
<bold>(A–C)</bold>
were possible starting points depended on data sources.</p>
</caption>
<graphic xlink:href="fbioe-03-00066-g001"></graphic>
</fig>
<p>For the 3D-structure construction, we initially employed 3D-generators. During the automatic conversion process by 3D-generator and energy minimization, we found excessive amounts of improper conversions (Maeda and Kondo,
<xref rid="B15" ref-type="bibr">2013</xref>
). To detect improper conversion, we tested several approaches. Finally, we decided to compare IUPAC Chemical Identifier (InChI; Coles et al.,
<xref rid="B4" ref-type="bibr">2005</xref>
; IUPAC Chemical Identifier,
<xref rid="B14" ref-type="bibr">2015</xref>
) and canonical simplified molecular input line entry specification (SMILES; Weininger,
<xref rid="B28" ref-type="bibr">1988</xref>
; Weininger et al.,
<xref rid="B29" ref-type="bibr">1989</xref>
; SMILES,
<xref rid="B23" ref-type="bibr">2008</xref>
) before and after conversion because of convenience and accuracy. The KEGG COMPOUND dataset (Goto et al.,
<xref rid="B10" ref-type="bibr">2002</xref>
) was used to estimate accuracy. Two structures whose InChI or SMILES were the same were defined as correct conversion. All our examined programs made errors, and most of errors were caused by atom chirality or cis/trans inversion. As a result, manual curation was performed to fix the 3D structures.</p>
<p>In our current workflow (Figure
<xref ref-type="fig" rid="F1">1</xref>
), three arrows marked as a, b, and c indicate starting points depending on data sources. Two kinds of sources are possible: publicly available databases and printed structures such as in books and journal articles. When the data source is supplied from an external database, it is often provided as 2D structures (all z atomic coordinates are zero in the MDL MOL/SD files). Therefore, to obtain 3D structures, the source data will be operated from Figure
<xref ref-type="fig" rid="F1">1</xref>
B. If paper materials are data sources, we follow the path in Figure
<xref ref-type="fig" rid="F1">1</xref>
A or Figure
<xref ref-type="fig" rid="F1">1</xref>
C. The choice depended on the number of human curators available to work on our project. Based on our experience, we now think that the starting point of manual edit (Figure
<xref ref-type="fig" rid="F1">1</xref>
C) is better if enough curators are available. However, when the number of curators is in short supply, we can only choose the starting point of print material (Figure
<xref ref-type="fig" rid="F1">1</xref>
A).</p>
<p>Although the accuracies of programs varied in our earlier work (Maeda and Kondo,
<xref rid="B15" ref-type="bibr">2013</xref>
), nowadays there is no significant difference between available software. We currently employ the program MOE™ 2011 for molecular building because it has an explicit chiral constraint as an option in energy minimization. Some erroneous chirality observed from the former version of MOE no longer occurs in the current version. In this article, we present what we learned from performing construction of the 3D structure of metabolites. We hope that our experience will benefit others working on database curation and will aid future development of relevant software.</p>
</sec>
<sec sec-type="materials|methods" id="S2">
<title>Materials and Methods</title>
<sec id="S2-1">
<title>Data sources</title>
<p>KEGG COMPOUND (release 29) was used as a source of 2D structure dataset. This dataset has almost 12,000 compounds. A picture of octadehydro-beta-carotene was taken from “Carotenoids Handbook” edited by Britton et al. (published by Birkhauser). Pictures of samples for chemical literature data extraction were taken from “Insecticides of Natural Origin” edited by Dev and Koul published by Harwood Academic Publishers. Pictures were scanned by Canoscan 9000F (Canon Inc., Tokyo, Japan) and converted to PDF files.</p>
</sec>
<sec id="S2-2">
<title>Conversion from picture to 2D coordinates</title>
<p>Compound pictures scanned from papers were extracted by CLiDE (Ibison et al.,
<xref rid="B12" ref-type="bibr">1993</xref>
) standard 5.2 with default parameters. When large numbers of structures are to be treated, CLiDE professional and CLiDE batch would be more convenient. However, CLiDE standard was taken on trial in this study and pictures were converted one by one. The resolution of input files was set at 400 dpi because of balance of accuracy and operation time during our pre-tests.</p>
</sec>
<sec id="S2-3">
<title>Conversion from 2D structure to 3D structure</title>
<p>2D structure files were transformed to 3D structures by using several versions of MOE. In the following parts, the particular version will be indicated with the result when necessary. Hydrogens were added to the 2D structures, which were then minimized by MOE using the MMFF94x (Halgren,
<xref rid="B11" ref-type="bibr">1996</xref>
) force field.</p>
</sec>
<sec id="S2-4">
<title>Detection of identity between two structures</title>
<p>IUPAC Chemical Identifier and canonical SMILES were employed to detect identity of 2D and 3D structures. Two structures were converted to InChI and canonical SMILES with chiral options. The two notations were compared in three steps: the initial 2D structure, the structure after addition of hydrogens, and the structures after energy minimization. Chiral tags of phosphate by SMILES are ignored because all oxygen atoms bonding to phosphorus are chemically equivalent. Correspondence between two InChI was estimated as having the same sub-layer strings for c (connectivity), h (hydration), b (double bond), and t (sp3 stereo).</p>
</sec>
<sec id="S2-5">
<title>Calculation of numbers about chiral information</title>
<p>Structures with chiral atoms or bonds were calculated based on InChI or canonical SMILES notation strings. All entries were checked in regard to existence of the t or b sub-layer of InChI and “@,” “∖” or “/” characters of canonical SMILES. Undefined chiral atoms were detected as the “?” character in the t sub-layer of InChI.</p>
</sec>
</sec>
<sec id="S3">
<title>Results</title>
<p>In order to collect chemical structures in a database, it is most basic and effective that skillful and careful chemists handle the data manually (Figure
<xref ref-type="fig" rid="F1">1</xref>
C). However, in our team, shortage of chemical curators has been a serious problem since the beginning of this project. In such cases, some steps in the workflow need to be performed automatically by computer or performed by non-chemists, although knowledgeable chemical curators must confirm the structures in the final step of verification. Automation involves two steps: picture to 2D coordinates and 2D–3D conversion processes (refer to Figure
<xref ref-type="fig" rid="F1">1</xref>
). All automatic processes must be evaluated for accuracy. Even though energy minimization is not an automated process, it too should be evaluated for accuracy.</p>
<sec id="S3-6">
<title>Translation from picture to 2D coordinates</title>
<p>Optical character reader (OCR) is often employed to convert a document from paper to computer readable text files. Similarly, a printed image of a chemical structure can be translated into 2D coordinates as text files. If the converted structures were reliable, 2D structure files could be produced without intervention of chemists. As the result of our investigation, we selected CLiDE distributed by Keymodule Inc.</p>
<p>Using CLiDE, chemical structures described on paper were translated to MDL mol format files of 2D atomic coordinates. Several types of natural compounds described in books were tested for proper conversion. Figure
<xref ref-type="fig" rid="F2">2</xref>
shows an example of octadehydro-beta-carotene. A printed image of chemical structure (Figure
<xref ref-type="fig" rid="F2">2</xref>
A) was converted to a 2D-mol file shown in Figure
<xref ref-type="fig" rid="F2">2</xref>
B. Many of the resulting files were correct, but some types of compounds could not be converted well.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>A schematic example of image to 3D structure about octadehydro-beta-carotene</bold>
. Each description shows
<bold>(A)</bold>
an iniitial structure of 2D drawing,
<bold>(B)</bold>
the 2D structure after CLiDE,
<bold>(C)</bold>
the minimized 3D structure with chiral constraint option, and
<bold>(D)</bold>
directly edited and minimized structure on the MOE window. Green numbers show dihedral angle of the bonds.</p>
</caption>
<graphic xlink:href="fbioe-03-00066-g002"></graphic>
</fig>
<p>Table
<xref ref-type="table" rid="T1">1</xref>
shows some examples with critical errors: missing characters indicating elements (in examples 3 and 4); missing parts of molecule (in examples 1 and 6); incorrectly recognized chiralities (in example 2); missing cis/trans information (in example 3); and improper recognition of resonance structure (in examples 5 and 6). In some cases, such as example 7, no data were produced.</p>
<table-wrap id="T1" position="float">
<label>Table 1</label>
<caption>
<p>
<bold>Examples of typical errors for translator</bold>
.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="center" rowspan="1" colspan="1">Query</th>
<th align="center" rowspan="1" colspan="1">Result</th>
<th align="left" rowspan="1" colspan="1">Errors</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">1. Canavanine</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i001.jpg"></inline-graphic>
</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i002.jpg"></inline-graphic>
</td>
<td align="left" rowspan="1" colspan="1">Some elemental descriptions drawn as characters on the paper were not translated</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2. Ryanodine</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i003.jpg"></inline-graphic>
</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i004.jpg"></inline-graphic>
</td>
<td align="left" rowspan="1" colspan="1">Bond direction and connectivity were not recognized in complex structures</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">3. Sparteine</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i005.jpg"></inline-graphic>
</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i006.jpg"></inline-graphic>
</td>
<td align="left" rowspan="1" colspan="1">“N” was recognized as a part of bonds. Some kinds of chiral information were lost</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">4. Terfairine</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i007.jpg"></inline-graphic>
</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i008.jpg"></inline-graphic>
</td>
<td align="left" rowspan="1" colspan="1">“Br” was not translated as a halogen atom</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">5. 3(R)-Millonol-B</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i009.jpg"></inline-graphic>
</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i010.jpg"></inline-graphic>
</td>
<td align="left" rowspan="1" colspan="1">Description of “resonance structure” cannot be converted</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">6. Veratridine</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i011.jpg"></inline-graphic>
</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i012.jpg"></inline-graphic>
</td>
<td align="left" rowspan="1" colspan="1">Some parts of the molecule were missing among relatively larger molecules</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">7. 1,7,9,15-Heptadecatetraene-11,13-diyne</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i013.jpg"></inline-graphic>
</td>
<td align="center" rowspan="1" colspan="1">
<inline-graphic xlink:href="fbioe-03-00066-i014.jpg"></inline-graphic>
</td>
<td align="left" rowspan="1" colspan="1">No structures were developed when all atoms were clearly described by elements</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<italic>The structures in the “query” column shows 2D-structures of the queries drawn by ChemDraw Pro (
<xref rid="B2" ref-type="bibr">2013</xref>
) version 13 and those in the “result” column are displayed figures from the CLiDE window. Highlighted in orange are possibly improperly translated parts automatically detected</italic>
.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="S3-7">
<title>Converting from 2D coordinates to 3D coordinates</title>
<p>Regardless of data source whether scanning from documents on paper or provided molecule files from another database project, it is necessary to convert 2D structure to final 3D structure. In the process, two important problems were encountered, i.e., chiral inversion and quality of the 3D structures. These problems were described in the following two sub-sections, respectively.</p>
<sec id="S3-7-1">
<title>Detection of Topological Errors</title>
<p>The conversion accuracy by MOE 2011 is shown in Table
<xref ref-type="table" rid="T2">2</xref>
. To confirm the retention of chirality in the converted structures, we compared canonical SMILES strings from both 2D and 3D structures, which make the structural evaluation a simple comparison of text strings. Minimizations were performed with default parameters and the MMFF94x force field, and all operations were performed under the chiral constraint option. Most of the chiral information was conserved (because of the chiral constraint). These results were much better than the ones from MOE 2004 or 2005 in our previous report (Maeda and Kondo,
<xref rid="B15" ref-type="bibr">2013</xref>
). Approximately 4,000 were conserved with MOE 2004 or 2005. The main reason for the chirality inversion is missing chiral tags in the initial 2D files. However, inversion of chirality and cis/trans stereochemistry were still observed. An example of chiral inversion is shown in Figure
<xref ref-type="fig" rid="F3">3</xref>
for KEGG COMPOUND entry C09519. As a result of the whole process, initial 2D structures and 3D structures were correctly converted. However, chiral inversion occurred twice at the same carbon (red characters in canonical SMILES).</p>
<table-wrap id="T2" position="float">
<label>Table 2</label>
<caption>
<p>
<bold>Frequency of errors regarding chirality and cis/trans stereochemistry during the conversion processes by MOE 2011</bold>
.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="center" rowspan="1" colspan="1">cpd – wash</th>
<th align="center" rowspan="1" colspan="1">wash – mm</th>
<th align="center" rowspan="1" colspan="1">cpd – mm</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Same</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">8,964</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">11,296</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">8,735</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Chiral inversion</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">268</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">119</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">201</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Chiral missing</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">2,152</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">521</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">2,417</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Cis/trans inversion</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">1</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">29</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">30</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Cis/trans missing</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">0</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">6</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">0</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Else unmatched</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">589</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">3</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">621</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<italic>All transferred structures were compared, categorized, and counted by each step of database minimization as shown in Table
<xref ref-type="table" rid="T3">3</xref>
. Structures of three steps were compared: the initial COMPOUND 2D-mol files (cpd), structures after wash (wash), and structures after minimization (mm). Each description means as follows: same, completely the same between two SMILES; chiral inversion, atomic chiral tags located on the same position but not the same; chiral missing, at least one of two strings lacking atomic chiral tags; cis/trans inversion, cis/trans tags located on the same position but not the same; cis/trans missing; at least one of two strings lacking cis/trans tags; else unmatched, mismatched by the other reasons</italic>
.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>
<bold>An example of 2D–3D conversion by MOE</bold>
. The C09519 entry of KEGG COMPOUND was converted by MOE. Each description shows
<bold>(A)</bold>
the molecular drawing provided in KEGG COMPOUND,
<bold>(B)</bold>
the 2D-mol file viewing by MOE,
<bold>(C)</bold>
the structure after add hydrogen, and
<bold>(D)</bold>
the structure after minimization. The structures
<bold>(B–D)</bold>
are shown with the corresponding SMILES notation strings. Red characters indicate a position of chiral inversion.</p>
</caption>
<graphic xlink:href="fbioe-03-00066-g003"></graphic>
</fig>
<p>IUPAC Chemical Identifier can be also employed to detect correspondence of two structures. Accuracy of two notation strings (canonical SMILES and InChI) was analyzed (Table
<xref ref-type="table" rid="T3">3</xref>
). For the correspondence of whole strings, both led to similar results from the two procedures of command-based generation (6,725 and 6,866 for InChI and SMILES, respectively) and generation by molecular database function (8,522 and 8,735 for InChI and SMILES, respectively). Accuracy for the detection is similar. However, it is clearly different between the structure generation procedures (as SMILES detection, 8,735 by using database function and 6,866 by command-based operation). To determine the reason for results being “different,” stereochemical information in SMILES and in InChI was evaluated. In Table
<xref ref-type="table" rid="T3">3</xref>
, “undefined chiral atoms” means that “?” characters (the undefined chiral tag) were detected in at least one string when compared with InChI or that “@/@@” characters were lost in one string when compared with canonical SMILES. More than 1,000 structures in the original dataset lacked chiral information in the structures. Errors of cis/trans stereochemistry were also observed (about 30 errors were detected by comparison of SMILES). Comparing the numbers of InChI formula layer of Table
<xref ref-type="table" rid="T3">3</xref>
, the correspondence by using command-line operation is better than by using the database function. The major part of this difference between these methods is caused by the salt removal process in the database operation. In our modified KEGG compound dataset, 374 entries consist of two or more molecules.</p>
<table-wrap id="T3" position="float">
<label>Table 3</label>
<caption>
<p>
<bold>Accuracy of 3D-structure construction by MOE 2011</bold>
.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="center" colspan="2" rowspan="1">Command
<hr></hr>
</th>
<th align="center" colspan="2" rowspan="1">Database
<hr></hr>
</th>
</tr>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="center" rowspan="1" colspan="1">InChI</th>
<th align="center" rowspan="1" colspan="1">SMILES</th>
<th align="center" rowspan="1" colspan="1">InChI</th>
<th align="center" rowspan="1" colspan="1">SMILES</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Whole notification strings</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">11,974</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">11,974</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">11,974</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">11,974</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> Completely same</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">6,725</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">6,866</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">8,522</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">8,735</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> Different</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">5,249</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">5,108</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">3,452</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">3,239</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Chiral errors</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> Undefined chiral atoms</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">1,337</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">2,513</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">1,863</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">2,417</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> Mismatch about chirality</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">3,586</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">2,454</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">147</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">201</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> Mismatch of cis/trans (including undefined bond stereochemistry)</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">109</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">33</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">474</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">30</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Correspondence of InChI layer</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> Formula</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">11,925</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">9,550</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> c (Connection)</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">11,925</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">9,569</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> h (Hydration)</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">11,900</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">9,539</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> b (Cis/trans stereochemistry)</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">11,860</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">9,217</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"> t (Chirality)</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">6,973</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">7,140</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<italic>Entries of KEGG COMPOUND without “R,” “n,” and “X” were the initial 2D structure dataset (11,974 compounds). Operations of wash and add hydrogen were performed as command-based (command) and molecular database function (database) of MOE. In comparing the initial COMPOUND 2D-mol file and structures after minimization, each description means as follows: same formula, correspondence of formula of main layer in InChI; same string, completely same two strings of InChI or SMILES; chiral difference, unmatched strings with same connectivity and hydration; undefined chiral atom, at least one of two strings lacking atomic chiral information; and chiral mismatch, not corresponding atomic chirality</italic>
.</p>
</table-wrap-foot>
</table-wrap>
<p>The number of structures with chiral atoms and bonds was also estimated (Table
<xref ref-type="table" rid="T4">4</xref>
). In our initial dataset of KEGG COMPOUND containing 11,974 structures, the entries with atomic chiral tags by InChI (detected as existence of t-sublayer) and canonical SMILES (detected as existence of “@” tags) were 6,608 and 6,704, respectively. The resulting 3D structures with chiral atoms were approximately 8,000 entries. In the same manner, the entries with cis/trans tags by InChI (b-sublayer) and canonical SMILES (“∖” or “/” tags) for initial 2D-files were 1,895 and 1,564, and for final 3D structures were 1,906 and 1,560, respectively.</p>
<table-wrap id="T4" position="float">
<label>Table 4</label>
<caption>
<p>
<bold>Chiral detection by InChI and canonical SMILES</bold>
.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="center" colspan="2" rowspan="1">Chiral tags
<hr></hr>
</th>
<th align="center" colspan="2" rowspan="1">Cis/trans tags
<hr></hr>
</th>
<th align="center" rowspan="1" colspan="1">Undefined chiral atoms
<hr></hr>
</th>
</tr>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="center" rowspan="1" colspan="1">InChI</th>
<th align="center" rowspan="1" colspan="1">Canonical SMILES</th>
<th align="center" rowspan="1" colspan="1">InChI</th>
<th align="center" rowspan="1" colspan="1">Canonical SMILES</th>
<th align="center" rowspan="1" colspan="1">InChI</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Initial 2D structures</td>
<td align="center" rowspan="1" colspan="1">6,618</td>
<td align="center" rowspan="1" colspan="1">6,704</td>
<td align="center" rowspan="1" colspan="1">1,895</td>
<td align="center" rowspan="1" colspan="1">1,564</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">1,316</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">After wash</td>
<td align="center" rowspan="1" colspan="1">8,010</td>
<td align="center" rowspan="1" colspan="1">7,803</td>
<td align="center" rowspan="1" colspan="1">1,942</td>
<td align="center" rowspan="1" colspan="1">1,555</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">3,045</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">After minimization</td>
<td align="center" rowspan="1" colspan="1">7,268</td>
<td align="center" rowspan="1" colspan="1">8,034</td>
<td align="center" rowspan="1" colspan="1">1,906</td>
<td align="center" rowspan="1" colspan="1">1,560</td>
<td align="char" char="." charoff="50" rowspan="1" colspan="1">162</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<italic>A number of structures with chiral tags were indicated. Calculation was based on 11,974 compounds from KEGG COMPOUND</italic>
.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="S3-7-2">
<title>Accuracy of 3D Structure</title>
<p>The next problem we address is the correctness of the whole structure of molecule with accurate partial structures. A structure obtained with correct chirality and cis/trans stereochemistry is shown in Figure
<xref ref-type="fig" rid="F2">2</xref>
C. This structure may be chemically erroneous. The first concern is dihedral angles at double bonds. Dihedral angles of single bonds between two double bonds are also shown in Figure
<xref ref-type="fig" rid="F2">2</xref>
C. The range of the values is −62.5° to 58.3°. Continuous and intermittent double-bonded structures such as 2,4,6-octa-triene are considered to have a planar conformation because such double bonds are conjugated. The second concern is the cis/trans stereochemistry around double bonds. Even if all double-bonded atoms are located in a plane, double bonds are cis-located about single bonds in between as shown in Figure
<xref ref-type="fig" rid="F2">2</xref>
D. Usually, larger substitution is considered far from each other because of steric hindrance. When the single bonds of Figure
<xref ref-type="fig" rid="F2">2</xref>
C were set to around 0°, the minimized structure as shown in Figure
<xref ref-type="fig" rid="F2">2</xref>
D was obtained. The molecular mechanics energies of structures 2c and 2d were 117.929 kcal and 103.632 kcal, respectively, as calculated by MOE 2011 with the MMFF94x force field. Conformation 2d is of lower energy.</p>
</sec>
</sec>
</sec>
<sec sec-type="discussion" id="S4">
<title>Discussion</title>
<sec id="S4-8">
<title>Hurdle of data exchange between databases</title>
<p>Some chemical databases such as KEGG COMPOUND and ZINC (Irwin et al.,
<xref rid="B13" ref-type="bibr">2012</xref>
) have 2D structures. Developing a database requires extensive human resources. Thus, it would be desirable to share reliable data among databases.</p>
<p>In our case, we adopted an SD file of KEGG COMPOUND because the database is recognized as a standard set of natural products. During the process of our development, we found the dataset contains undefined atoms and/or residues. For example, COMPOUND entry of C00045 is “amino acid” containing one “R” residue (the formula is shown as “C2H4NO2R”). Generally, R of amino acids contained in proteins implies 20 kinds of side chain variation and does not define just one compound. Similarly, “X” and “n” mean some kinds of halogens and
<italic>n</italic>
-times repeat of a unit, respectively. This kind of description is needed for the purpose of KEGG COMPOUND but is not suitable for our objective. As the result, these entries should be removed because our database requires only defined compound structures.</p>
<p>During the analysis of chiral inversion problem, we found that many entries of external data had no chiral tags of atoms. Because both of 2D–3D converters and “add hydrogens” randomly give a chiral tag to the atom which chiral information is not explicitly given, the resulting 3D structures can contain wrong chiral information. Random addition of chiral tags is permitted to calculate accuracy when there is no chiral definition in pre-converted structures. Most of basic metabolites described in “the metabolic map” should be defined with all chiral information about atoms and bonds. However, not all books describe with complete chiral tags because the chirality may be common knowledge among some chemists. Therefore, we need to verify the new structural data of such compounds.</p>
<p>ZINC and PubChem (Wang et al.,
<xref rid="B27" ref-type="bibr">2009</xref>
) contain a much larger number of natural/artificial compounds than any of the other compound databases. Because of the large number of artificial compounds in such databases, it would be tedious to screen them out from the natural compounds. For this reason, the databases were not chosen as our database resources.</p>
</sec>
<sec id="S4-9">
<title>Limitation on three-dimensionalization from 2D structure</title>
<p>In the process of 3D-structure generation, two main problems occurred. The first problem arises from the nature of the software, and the other is due to the limitation of molecular mechanics calculations.</p>
<p>Accuracy of six 3D generators containing CONCORD and CORINA was compared in the previous report by Sadowski et al. (
<xref rid="B22" ref-type="bibr">1994</xref>
). In the study, 639 X-ray structures in Cambridge Structural Database were analyzed. The dataset contained 213 structures with chiral centers and 35 structures with cis/trans stereochemical double bonds. For all compounds of the dataset, CONCORD and CORINA could generate structures correctly. However, we found that CONCORD makes mistakes in conversion (Maeda and Kondo,
<xref rid="B15" ref-type="bibr">2013</xref>
). We did not confirm the details of the Sadowski’s set. More complex structures would be encountered by automatic generation of natural compounds such as carotenoids and sugars.</p>
<p>It was also reported that many chiral inversions occurred during three-dimensionalization processes by MOE 2004 or 2005 in our previous report (Maeda and Kondo,
<xref rid="B15" ref-type="bibr">2013</xref>
). We (and probably many other users also) have requested that the software developers fix the problem. In version 2011, conservation accuracy during minimization is greatly improved with the chiral constrain option. As seen in Table
<xref ref-type="table" rid="T2">2</xref>
, 94.3% (11,296 entries) of the structures with chiral information are correctly handled during the energy minimization process (wash – mm). Erroneous conversion of chirality and cis/trans stereochemistry now total only 1.2% (148 entries), while 4.4% (527 entries) of structures lack stereochemical information and cannot be assessed. Thus, we consider that the new software version is acceptable.</p>
<p>Though the example of C09519 shown in Figure
<xref ref-type="fig" rid="F3">3</xref>
is counted as a correct conversion, chiral inversion is still observed during the two processes of 3D generation and energy minimization. Minimization with the option to conserve chirality in MOE 2011 makes fewer errors regarding atom inversion, and hence we use this version of MOE. However, chiral inversion still occurred. Many of the inversions involved chirality (around 200 structures for database operation). Inversion of cis/trans stereochemistry (E–Z or vice versa) was observed in a few cases (about 30 structures). We usually use command base operation in development of 3D structures. However, chiral inversion occurs much less when using database operation.</p>
<p>The second problem pertains to conjugated double bonds. When a carotenoid structure drawn on the paper was transferred to a 2D-mol file and converted to a 3D structure, a planar conformation of conjugated double bonds could not be obtained. The reason for this problem appears to be caused by inappropriate bond lengths in the initial 2D structures. When we observed the three-dimensionalization process on MOE’s graphic window, the molecule stretched to an extended conformation. The bond length of converted 2D structures from pictures (0.90–0.98 Å) was shorter than ideal (the length of an optimized C–C bond in MOE is about 1.5 Å). This problem can be avoided by manual input. When 3D structures are manually constructed, the conjugated double bond moiety is planar. If such structures were optimized by energy minimization, the structure of continuous double bonds will be put in a planar conformation initially.</p>
<p>The other problem about double bonds concerns connected double bonds as in allenes. Two double bonds of allene are at 90° because of the angle of two pi orbitals. When allenes were edited on the MOE window, two double bonds after MMFF94x minimization were often located in a plane (around 180°). However, some allenes made by the same procedure minimized at around 90°. The difference was not caused by the initial conformation but would be influenced by the other parts of the molecule, for example, as in steric hindrance. If strictly correct structures must be obtained, we could carry out semi-empirical molecular orbital (MO) calculations. In MOE 2011 and later, the PM3 (Stewart,
<xref rid="B24" ref-type="bibr">1989</xref>
), AM1 (Dewar et al.,
<xref rid="B7" ref-type="bibr">1985</xref>
), or MNDO (Dewar and Thiel,
<xref rid="B6" ref-type="bibr">1977</xref>
) parameters in MOPAC (Stewart,
<xref rid="B25" ref-type="bibr">1990</xref>
) can be selected for energy minimization. Generally, the compounds we process are too large for routine use of
<italic>ab initio</italic>
methods.</p>
</sec>
<sec id="S4-10">
<title>Notes about manual operation</title>
<p>Nowadays, new structures are often made manually for our database 3DMET. The operation is performed by curators. During the process, some problems of 3D structures development were revealed.</p>
<p>One problem is a limitation of molecular mechanics calculation. The MMFF94x force field is used for energy minimization in our current work. This is generally regarded as a relatively reliable and versatile force field. However, structures of allenes cannot be treated properly, as described above. Because molecular mechanics (MM) energy minimization does not adequately consider electronic structure, resulting 3D structures are sometimes not reproduced properly. To avoid this problem, molecular orbital calculations can be adopted if it is necessary to obtain accurate structures. We previously tried to run
<italic>ab initio</italic>
MO calculations on all structures. However, this required too much time to perform energy minimization and some structures were not optimized because of oscillation. Although strictly correct structures are desired for investigators of molecular docking, some compromises are necessary. For example, docking programs such as FlexX (Rarey et al.,
<xref rid="B21" ref-type="bibr">1996</xref>
) adopt an incremental construction algorithm. If using this type of program, stereochemical information is important but strictly correct conformational information may not be necessary. Therefore, we collect structures according to the following policy. All entries were generated by MM calculation. When curators think that molecular mechanics is inadequate or misleading, they can use quantum chemistry to generate better 3D structures.</p>
<p>The second problem involves manual mistakes. Many persons have served as chemical curators in our 3DMET development group. In our experience, to increase reliability of contents, it is important that curators have enough knowledge of stereochemistry. However, even with the most skillful chemists, manual operation can sometimes lead to mistakes. The frequencies and type of mistakes depended on the person. Thus, in our current project, manually constructed chemical structures are verified by another curator to avoid wrong structures. The next release of 3DMET will be published with curated 3D structures by the above protocol.</p>
</sec>
<sec id="S4-11">
<title>Evaluation of two structures including stereochemical correspondence</title>
<p>In this study, two structures were compared by correspondence between InChI and canonical SMILES. Before a decision to employ InChI and canonical SMILES, we also evaluated other programs. Regarding SMILES, the programs provided with SYBYL (Isomeric SMILES; SYBYL,
<xref rid="B26" ref-type="bibr">2015</xref>
) and MOE (Unique SMILES) were tested. As mentioned before, a MOE function named “aRSChirality” was also evaluated because chiral inversions were the main mistakes. In this section, the results are summarized.</p>
<p>In principle, several SMILES notations can be output from one chemical structure depending on the first atom selected. For duplicate detection, one string should mean only one structure. Canonical SMILES, Isomeric SMILES, and Unique SMILES should be all developed to gain a “unique” SMILES notation for such purpose. However, the outputs of unique notations were different, such as in the example of C00125 of COMPOUND shown in Figure
<xref ref-type="fig" rid="F4">4</xref>
. The differences were caused by the choice of the first atom. Canonicalization of SMILES by Daylight was explained in the report of the algorithm (Weininger et al.,
<xref rid="B29" ref-type="bibr">1989</xref>
). The first atom is defined as the end of the longest chain in the molecule, whereas in Isomeric SMILES and Unique SMILES the atom with the largest mass is selected as the first atom. Therefore, isomeric SMILES and unique SMILES were the same in the example of Figure
<xref ref-type="fig" rid="F4">4</xref>
. At first, we employed isomeric SMILES to detect correspondence of two structures. However, current detection of non-correspondence between two structures is mainly carried out by canonical SMILES because more structures were translatable.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption>
<p>
<bold>Several SMILES output</bold>
. The C00125 structure of COMPOUND
<bold>(A)</bold>
was transferred to three SMILES notation:
<bold>(B)</bold>
isomeric SMILES of SYBYL,
<bold>(C)</bold>
unique SMILES of MOE, and
<bold>(D)</bold>
canonical SMILES by Daylight.</p>
</caption>
<graphic xlink:href="fbioe-03-00066-g004"></graphic>
</fig>
<p>Structures that were unable to be translated into SMILES were also analyzed. For canonical SMILES, such structures included a highly complex portion, especially for symmetric structures. The structure of C00125 (Figure
<xref ref-type="fig" rid="F4">4</xref>
) did provide canonical SMILES. However, some porphyrins with short substituted residues attached to the core structure were not translatable. The reason for this is that the canonicalization has failed.</p>
<p>Conservation of stereochemistry during three-dimensionalization detected by InChI and canonical SMILES is shown in Table
<xref ref-type="table" rid="T3">3</xref>
. Differences in accuracy using InChI or canonical SMILES notation strings were negligible (6,725 and 6,866 command operation and 8,522 and 8,735 database operation, respectively). In our experience, transferable structures were different depending on the notation strings. In such cases, we judged structures being processed to be conserved when the strings were the same.</p>
<p>For detection of correspondence, the aRSChirality function of MOE was also tested. It detects chirality for each atom. The aRSChirality strings of chiral tags order depends on the order of the atoms in the input files and contain tags for hydrogens. When two structures are compared by this method, the orders of the atoms should be the same and tags of hydrogens should be removed. Several years ago we judged it best to use canonical SMILES rather than aRSChirality. Recently, an SVL script for MOE was coded to correspond atoms of two structures by the MOE supporting staff of Ryoka Systems Inc. Thus, it may be applicable to check chirality now (we did not estimate detailed results).</p>
</sec>
<sec id="S4-12">
<title>Messages from database developers to program developers</title>
<p>Problems during the generation of chemical structures of 3DMET have been described. Our motivation in this paper is to help make software developers aware of present problems in construction of chemical databases. The tested programs generate adequate structures most of time, but further improvement of the algorithms is desired for better results.</p>
<p>Most of the observed errors involve inversion of chirality. Wrong cis/trans stereochemistry was also observed. However, it is encountered much less than chiral inversion. The errors may be the result of lack of sufficient information on chirality and cis/trans stereochemistry. Bond and atom types are defined in the molecular mechanics programs. Usually, molecular connectivity is described as graphs consisting of atoms (nodes) and bonds (edges). Fundamentally, the direction of edges is not defined in graphs. If no limitation for bonds were programed, chiral information would be ignored during calculation. The recent version of MOE (2011 and later) can limit errors on chiral definition (Table
<xref ref-type="table" rid="T4">4</xref>
). Thus, we use MOE 2011 to make 3D structures in our manual process. In the constructed chemical structures, errors on chirality are still observed. Further improvement is desired for automatic treatment. As a user, we should pay attention if we use 3D-structure datasets that were developed with automation.</p>
</sec>
</sec>
<sec id="S5">
<title>Conflict of Interest Statement</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<ack>
<p>This manuscript outlines the experience of the 3DMET development group since 2009. Many curators contributed. Especially, the following curators in our group found critical errors described in this manuscript: Mr. Makoto Saikawa, strange 3D structures about conjugated double bonds of carotenoids; Mr. Kazuo Tsuzuki, incorrect structures for the connecting double bonds of allenes; and Mr. Teiichi Hattori, cis–trans inversion during energy minimization. We greatly appreciate to the editor, the reviewer, and the editor of Chemistry Editing Service (USA) for improvement of this manuscript. This research was supported by JSPS KAKENHI (Grant-in-Aid for Publication of Scientific Research Results) Grant Numbers 218065, 238062, 248057, and 2578008.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bostrom</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Greenwood</surname>
<given-names>J. R.</given-names>
</name>
<name>
<surname>Gottfries</surname>
<given-names>J.</given-names>
</name>
</person-group>
(
<year>2003</year>
).
<article-title>Assessing the performance of OMEGA with respect to retrieving bioactive conformations</article-title>
.
<source>J. Mol. Graph. Model.</source>
<volume>21</volume>
,
<fpage>449</fpage>
<lpage>462</lpage>
.
<pub-id pub-id-type="doi">10.1016/S1093-3263(02)00204-8</pub-id>
<pub-id pub-id-type="pmid">12543140</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="webpage">ChemDraw. (2013).
<italic>Waltham, Massachusetts: PerkinElmer Informatics</italic>
Available at:
<uri xlink:type="simple" xlink:href="http://insideinformatics.cambridgesoft.com/">http://insideinformatics.cambridgesoft.com/</uri>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="book">
<collab>CLiDE</collab>
. (
<year>2014</year>
).
<publisher-loc>Leeds</publisher-loc>
:
<publisher-name>Keymodule Ltd</publisher-name>
Available at:
<uri xlink:type="simple" xlink:href="http://www.keymodule.co.uk/">http://www.keymodule.co.uk/</uri>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Coles</surname>
<given-names>S. J.</given-names>
</name>
<name>
<surname>Day</surname>
<given-names>N. E.</given-names>
</name>
<name>
<surname>Murray-Rust</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Rzepa</surname>
<given-names>H. S.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
</person-group>
(
<year>2005</year>
).
<article-title>Enhancement of the chemical semantic web through the use of InChI identifiers</article-title>
.
<source>Org. Biomol. Chem.</source>
<volume>3</volume>
,
<fpage>1832</fpage>
<lpage>1834</lpage>
.
<pub-id pub-id-type="doi">10.1039/b502828k</pub-id>
<pub-id pub-id-type="pmid">15889163</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="book">
<collab>CORINA</collab>
. (
<year>2015</year>
).
<publisher-loc>Erlangen</publisher-loc>
:
<publisher-name>Molecular Networks GmbH</publisher-name>
Available at:
<uri xlink:type="simple" xlink:href="https://www.molecular-networks.com/">https://www.molecular-networks.com/</uri>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dewar</surname>
<given-names>M. J. S.</given-names>
</name>
<name>
<surname>Thiel</surname>
<given-names>W.</given-names>
</name>
</person-group>
(
<year>1977</year>
).
<article-title>Ground-states of molecules. 38. The MNDO method. Approximations and parameters</article-title>
.
<source>J. Am. Chem. Soc.</source>
<volume>99</volume>
,
<fpage>4899</fpage>
<lpage>4907</lpage>
.
<pub-id pub-id-type="doi">10.1021/ja00448a001</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dewar</surname>
<given-names>M. J. S.</given-names>
</name>
<name>
<surname>Zoebisch</surname>
<given-names>E. G.</given-names>
</name>
<name>
<surname>Healy</surname>
<given-names>E. F.</given-names>
</name>
<name>
<surname>Stewart</surname>
<given-names>J. P.</given-names>
</name>
</person-group>
(
<year>1985</year>
).
<article-title>AM1: a new general purpose quantum mechanical molecular model</article-title>
.
<source>J. Am. Chem. Soc.</source>
<volume>107</volume>
,
<fpage>3902</fpage>
<lpage>3909</lpage>
.
<pub-id pub-id-type="doi">10.1021/ja00299a024</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Filippov</surname>
<given-names>I. V.</given-names>
</name>
<name>
<surname>Nicklaus</surname>
<given-names>M. C.</given-names>
</name>
</person-group>
(
<year>2009</year>
).
<article-title>Optical structure recognition software to recover chemical information: OSRA, an open source solution</article-title>
.
<source>J. Chem. Inf. Model.</source>
<volume>49</volume>
,
<fpage>740</fpage>
<lpage>743</lpage>
.
<pub-id pub-id-type="doi">10.1021/ci800067r</pub-id>
<pub-id pub-id-type="pmid">19434905</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gasteiger</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Rudolph</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sadowski</surname>
<given-names>J.</given-names>
</name>
</person-group>
(
<year>1990</year>
).
<article-title>Automatic generation of 3D-atomic coordinates for organic molecules</article-title>
.
<source>Tetrahedron Comput. Methodol.</source>
<volume>3</volume>
,
<fpage>537</fpage>
<lpage>547</lpage>
.
<pub-id pub-id-type="doi">10.1016/0898-5529(90)90156-3</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goto</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Okuno</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Hattori</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Nishioka</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
</person-group>
(
<year>2002</year>
).
<article-title>LIGAND: database of chemical compounds and reactions in biological pathways</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>30</volume>
,
<fpage>402</fpage>
<lpage>404</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/30.1.402</pub-id>
<pub-id pub-id-type="pmid">11752349</pub-id>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Halgren</surname>
<given-names>T. A.</given-names>
</name>
</person-group>
(
<year>1996</year>
).
<article-title>Merck molecular force field. 1. Basis, form, scope, parameterization, and performance of MMFF94</article-title>
.
<source>J. Comput. Chem.</source>
<volume>17</volume>
,
<fpage>490</fpage>
<lpage>519</lpage>
.
<pub-id pub-id-type="doi">10.1002/(SICI)1096-987X(199604)17:6<587::AID-JCC4>3.0.CO;2-P</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ibison</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Jacquot</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kam</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Neville</surname>
<given-names>A. G.</given-names>
</name>
<name>
<surname>Simpson</surname>
<given-names>R. W.</given-names>
</name>
<name>
<surname>Tonnelier</surname>
<given-names>C.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>1993</year>
).
<article-title>Chemical literature data extraction: the CLiDE Project</article-title>
.
<source>J. Chem. Inf. Comput. Sci.</source>
<volume>33</volume>
,
<fpage>338</fpage>
<lpage>344</lpage>
.
<pub-id pub-id-type="doi">10.1021/ci00013a010</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Irwin</surname>
<given-names>J. J.</given-names>
</name>
<name>
<surname>Sterling</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Mysinger</surname>
<given-names>M. M.</given-names>
</name>
<name>
<surname>Bolstad</surname>
<given-names>E. S.</given-names>
</name>
<name>
<surname>Coleman</surname>
<given-names>R. G.</given-names>
</name>
</person-group>
(
<year>2012</year>
).
<article-title>ZINC: a free tool to discover chemistry for biology</article-title>
.
<source>J. Chem. Inf. Model.</source>
<volume>52</volume>
,
<fpage>1757</fpage>
<lpage>1768</lpage>
.
<pub-id pub-id-type="doi">10.1021/ci3001277</pub-id>
<pub-id pub-id-type="pmid">22587354</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="webpage">
<collab>IUPAC Chemical Identifier (InChI)</collab>
. (
<year>2015</year>
).
<article-title>The InChI Trust</article-title>
. Available at:
<uri xlink:type="simple" xlink:href="http://www.iupac.org/home/publications/e-resoutces/inchi.html">http://www.iupac.org/home/publications/e-resoutces/inchi.html</uri>
;
<uri xlink:type="simple" xlink:href="http://www.inchi-trust.org/">http://www.inchi-trust.org/</uri>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maeda</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>Kondo</surname>
<given-names>K.</given-names>
</name>
</person-group>
(
<year>2013</year>
).
<article-title>Three-dimensional structure database of natural metabolites (3DMET): a novel database of curated 3D structures</article-title>
.
<source>J. Chem. Inf. Model.</source>
<volume>53</volume>
,
<fpage>527</fpage>
<lpage>533</lpage>
.
<pub-id pub-id-type="doi">10.1021/ci300309k</pub-id>
<pub-id pub-id-type="pmid">23293959</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McDaniel</surname>
<given-names>J. R.</given-names>
</name>
<name>
<surname>Balmuth</surname>
<given-names>J. R.</given-names>
</name>
</person-group>
(
<year>1992</year>
).
<article-title>Kekule: OCR-optical chemical (structure) recognition</article-title>
.
<source>J. Chem. Inf. Comput. Sci.</source>
<volume>32</volume>
,
<fpage>373</fpage>
<lpage>378</lpage>
.
<pub-id pub-id-type="doi">10.1021/ci00008a018</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="book">
<collab>Molecular Operating Environment (MOE
<sup>TM</sup>
)</collab>
. (
<year>2015</year>
).
<publisher-loc>Montreal, QC</publisher-loc>
:
<publisher-name>Chemical Computing Group</publisher-name>
Available at:
<uri xlink:type="simple" xlink:href="http://www.chemcomp.com/">http://www.chemcomp.com/</uri>
; MOE Customer Support in Japan. Tokyo: Ryoka System Inc. Available at:
<uri xlink:type="simple" xlink:href="http://www.rsi.co.jp/science/cs/ccg/index_e.html">http://www.rsi.co.jp/science/cs/ccg/index_e.html</uri>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="book">
<collab>OMEGA</collab>
. (
<year>2015</year>
).
<publisher-loc>Santa Fe, NM</publisher-loc>
:
<publisher-name>OpenEye Scientific Software</publisher-name>
Available at:
<uri xlink:type="simple" xlink:href="http://www.eyesopen.com/">http://www.eyesopen.com/</uri>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Park</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Rosania</surname>
<given-names>G. R.</given-names>
</name>
<name>
<surname>Shedden</surname>
<given-names>K. A.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Lyu</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Saitou</surname>
<given-names>K.</given-names>
</name>
</person-group>
(
<year>2009</year>
).
<article-title>Automated extraction of chemical structure information from digital raster images</article-title>
.
<source>Chem. Cent. J.</source>
<volume>3</volume>
,
<fpage>4</fpage>
.
<pub-id pub-id-type="doi">10.1186/1752-153X-3-4</pub-id>
<pub-id pub-id-type="pmid">19196483</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pearlman</surname>
<given-names>R. S.</given-names>
</name>
</person-group>
(
<year>1987</year>
).
<article-title>Rapid generation of high quality approximate 3D molecular structures</article-title>
.
<source>Chem. Des. Autom. News</source>
<volume>2</volume>
,
<fpage>5</fpage>
<lpage>6</lpage>
.</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rarey</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Kramer</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Lengauer</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Klebe</surname>
<given-names>G.</given-names>
</name>
</person-group>
(
<year>1996</year>
).
<article-title>A fast flexible docking method using an incremental construction algorithm</article-title>
.
<source>J. Mol. Biol.</source>
<volume>261</volume>
,
<fpage>470</fpage>
<lpage>489</lpage>
.
<pub-id pub-id-type="doi">10.1006/jmbi.1996.0477</pub-id>
<pub-id pub-id-type="pmid">8780787</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sadowski</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Gasteiger</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Klebe</surname>
<given-names>G.</given-names>
</name>
</person-group>
(
<year>1994</year>
).
<article-title>Comparison of automatic three-dimensional model builders using 639 X-ray structures</article-title>
.
<source>J. Chem. Inf. Comput. Sci.</source>
<volume>34</volume>
,
<fpage>1000</fpage>
<lpage>1008</lpage>
.
<pub-id pub-id-type="doi">10.1021/ci00020a039</pub-id>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="book">
<collab>SMILES</collab>
. (
<year>2008</year>
).
<publisher-loc>Laguna Niguel, CA</publisher-loc>
:
<publisher-name>Daylight Chemical Information Systems, Inc.</publisher-name>
Available at:
<uri xlink:type="simple" xlink:href="http://www.daylight.com/">http://www.daylight.com/</uri>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stewart</surname>
<given-names>J. J. P.</given-names>
</name>
</person-group>
(
<year>1989</year>
).
<article-title>Optimization of parameters for semiempirical methods. I. Method</article-title>
.
<source>J. Comput. Chem.</source>
<volume>10</volume>
,
<fpage>209</fpage>
<lpage>220</lpage>
.
<pub-id pub-id-type="doi">10.1002/jcc.540100208</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stewart</surname>
<given-names>J. J. P.</given-names>
</name>
</person-group>
(
<year>1990</year>
).
<article-title>MOPAC: a semiempirical molecular orbital program</article-title>
.
<source>J. Comput. Aided Mol. Des.</source>
<volume>4</volume>
,
<fpage>1</fpage>
<lpage>105</lpage>
.
<pub-id pub-id-type="doi">10.1007/BF00128336</pub-id>
<pub-id pub-id-type="pmid">2197373</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="book">
<collab>SYBYL</collab>
. (
<year>2015</year>
).
<publisher-loc>St. Louis, MO</publisher-loc>
:
<publisher-name>Certara L.P.</publisher-name>
Available at:
<uri xlink:type="simple" xlink:href="http://www.certara.com/">http://www.certara.com/</uri>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Suzek</surname>
<given-names>T. O.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bryant</surname>
<given-names>S. H.</given-names>
</name>
</person-group>
(
<year>2009</year>
).
<article-title>PubChem: a public information system for analyzing bioactivities of small molecules</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>37</volume>
,
<fpage>W623</fpage>
<lpage>W633</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gkp456</pub-id>
<pub-id pub-id-type="pmid">19498078</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weininger</surname>
<given-names>D.</given-names>
</name>
</person-group>
(
<year>1988</year>
).
<article-title>SMILES, a chemical language and information-system. 1. Introduction to methodology and encoding rules</article-title>
.
<source>J. Chem. Inf. Comput. Sci.</source>
<volume>28</volume>
,
<fpage>31</fpage>
<lpage>36</lpage>
.
<pub-id pub-id-type="doi">10.1021/ci00057a005</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weininger</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Weininger</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Weininger</surname>
<given-names>J. L.</given-names>
</name>
</person-group>
(
<year>1989</year>
).
<article-title>SMILES. 2. Algorithm for generation of unique SMILES notation</article-title>
.
<source>J. Chem. Inf. Comput. Sci.</source>
<volume>29</volume>
,
<fpage>97</fpage>
<lpage>101</lpage>
.
<pub-id pub-id-type="doi">10.1021/ci00062a008</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
<affiliations>
<list>
<country>
<li>Japon</li>
</country>
</list>
<tree>
<country name="Japon">
<noRegion>
<name sortKey="Maeda, Miki H" sort="Maeda, Miki H" uniqKey="Maeda M" first="Miki H." last="Maeda">Miki H. Maeda</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000026 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000026 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:4443773
   |texte=   Current Challenges in Development of a Database of Three-Dimensional Chemical Structures
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:26075200" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024