The semantics of Chemical Markup Language (CML) for computational chemistry : CompChem
Identifieur interne : 000156 ( Pmc/Corpus ); précédent : 000155; suivant : 000157The semantics of Chemical Markup Language (CML) for computational chemistry : CompChem
Auteurs : Weerapong Phadungsukanan ; Markus Kraft ; Joe A. Townsend ; Peter Murray-RustSource :
- Journal of Cheminformatics [ 1758-2946 ] ; 2012.
Abstract
This paper introduces a subdomain chemistry format for storing computational chemistry data called CompChem. It has been developed based on the design, concepts and methodologies of Chemical Markup Language (CML) by adding computational chemistry semantics on top of the CML Schema. The format allows a wide range of
Url:
DOI: 10.1186/1758-2946-4-15
PubMed: 22870956
PubMed Central: 3434037
Links to Exploration step
PMC:3434037Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">The semantics of Chemical Markup Language (CML) for computational chemistry : CompChem</title>
<author><name sortKey="Phadungsukanan, Weerapong" sort="Phadungsukanan, Weerapong" uniqKey="Phadungsukanan W" first="Weerapong" last="Phadungsukanan">Weerapong Phadungsukanan</name>
<affiliation><nlm:aff id="I1">Department of Chemical Engineering and Biotechnology, University of Cambridge, New Museums Site, Pembroke Street, Cambridge, CB2 3RA, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Kraft, Markus" sort="Kraft, Markus" uniqKey="Kraft M" first="Markus" last="Kraft">Markus Kraft</name>
<affiliation><nlm:aff id="I1">Department of Chemical Engineering and Biotechnology, University of Cambridge, New Museums Site, Pembroke Street, Cambridge, CB2 3RA, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Townsend, Joe A" sort="Townsend, Joe A" uniqKey="Townsend J" first="Joe A" last="Townsend">Joe A. Townsend</name>
<affiliation><nlm:aff id="I2">Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Murray Rust, Peter" sort="Murray Rust, Peter" uniqKey="Murray Rust P" first="Peter" last="Murray-Rust">Peter Murray-Rust</name>
<affiliation><nlm:aff id="I2">Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">22870956</idno>
<idno type="pmc">3434037</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3434037</idno>
<idno type="RBID">PMC:3434037</idno>
<idno type="doi">10.1186/1758-2946-4-15</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000156</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000156</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">The semantics of Chemical Markup Language (CML) for computational chemistry : CompChem</title>
<author><name sortKey="Phadungsukanan, Weerapong" sort="Phadungsukanan, Weerapong" uniqKey="Phadungsukanan W" first="Weerapong" last="Phadungsukanan">Weerapong Phadungsukanan</name>
<affiliation><nlm:aff id="I1">Department of Chemical Engineering and Biotechnology, University of Cambridge, New Museums Site, Pembroke Street, Cambridge, CB2 3RA, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Kraft, Markus" sort="Kraft, Markus" uniqKey="Kraft M" first="Markus" last="Kraft">Markus Kraft</name>
<affiliation><nlm:aff id="I1">Department of Chemical Engineering and Biotechnology, University of Cambridge, New Museums Site, Pembroke Street, Cambridge, CB2 3RA, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Townsend, Joe A" sort="Townsend, Joe A" uniqKey="Townsend J" first="Joe A" last="Townsend">Joe A. Townsend</name>
<affiliation><nlm:aff id="I2">Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Murray Rust, Peter" sort="Murray Rust, Peter" uniqKey="Murray Rust P" first="Peter" last="Murray-Rust">Peter Murray-Rust</name>
<affiliation><nlm:aff id="I2">Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">Journal of Cheminformatics</title>
<idno type="eISSN">1758-2946</idno>
<imprint><date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title></title>
<p>This paper introduces a subdomain chemistry format for storing computational chemistry data called CompChem. It has been developed based on the design, concepts and methodologies of Chemical Markup Language (CML) by adding computational chemistry semantics on top of the CML Schema. The format allows a wide range of <italic>ab initio</italic>
quantum chemistry calculations of individual molecules to be stored. These calculations include, for example, single point energy calculation, molecular geometry optimization, and vibrational frequency analysis. The paper also describes the supporting infrastructure, such as processing software, dictionaries, validation tools and database repositories. In addition, some of the challenges and difficulties in developing common computational chemistry dictionaries are discussed. The uses of CompChem are illustrated by two practical applications.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Frisch, Mj" uniqKey="Frisch M">MJ Frisch</name>
</author>
<author><name sortKey="Trucks, Gw" uniqKey="Trucks G">GW Trucks</name>
</author>
<author><name sortKey="Schlegel, Hb" uniqKey="Schlegel H">HB Schlegel</name>
</author>
<author><name sortKey="Scuseria, Ge" uniqKey="Scuseria G">GE Scuseria</name>
</author>
<author><name sortKey="Robb, Ma" uniqKey="Robb M">MA Robb</name>
</author>
<author><name sortKey="Cheeseman, Jr" uniqKey="Cheeseman J">JR Cheeseman</name>
</author>
<author><name sortKey="Montgomery, Jajr" uniqKey="Montgomery J">JAJr Montgomery</name>
</author>
<author><name sortKey="Vreven, T" uniqKey="Vreven T">T Vreven</name>
</author>
<author><name sortKey="Kudin, Kn" uniqKey="Kudin K">KN Kudin</name>
</author>
<author><name sortKey="Burant, Jc" uniqKey="Burant J">JC Burant</name>
</author>
<author><name sortKey="Millam, Jm" uniqKey="Millam J">JM Millam</name>
</author>
<author><name sortKey="Iyengar, Ss" uniqKey="Iyengar S">SS Iyengar</name>
</author>
<author><name sortKey="Tomasi, J" uniqKey="Tomasi J">J Tomasi</name>
</author>
<author><name sortKey="Barone, V" uniqKey="Barone V">V Barone</name>
</author>
<author><name sortKey="Mennucci, B" uniqKey="Mennucci B">B Mennucci</name>
</author>
<author><name sortKey="Cossi, M" uniqKey="Cossi M">M Cossi</name>
</author>
<author><name sortKey="Scalmani, G" uniqKey="Scalmani G">G Scalmani</name>
</author>
<author><name sortKey="Rega, N" uniqKey="Rega N">N Rega</name>
</author>
<author><name sortKey="Petersson, Ga" uniqKey="Petersson G">GA Petersson</name>
</author>
<author><name sortKey="Nakatsuji, H" uniqKey="Nakatsuji H">H Nakatsuji</name>
</author>
<author><name sortKey="Hada, M" uniqKey="Hada M">M Hada</name>
</author>
<author><name sortKey="Ehara, M" uniqKey="Ehara M">M Ehara</name>
</author>
<author><name sortKey="Toyota, K" uniqKey="Toyota K">K Toyota</name>
</author>
<author><name sortKey="Fukuda, R" uniqKey="Fukuda R">R Fukuda</name>
</author>
<author><name sortKey="Hasegawa, J" uniqKey="Hasegawa J">J Hasegawa</name>
</author>
<author><name sortKey="Ishida, M" uniqKey="Ishida M">M Ishida</name>
</author>
<author><name sortKey="Nakajima, T" uniqKey="Nakajima T">T Nakajima</name>
</author>
<author><name sortKey="Honda, Y" uniqKey="Honda Y">Y Honda</name>
</author>
<author><name sortKey="Kitao, O" uniqKey="Kitao O">O Kitao</name>
</author>
<author><name sortKey="Nakai, H" uniqKey="Nakai H">H Nakai</name>
</author>
<author><name sortKey="Klene, M" uniqKey="Klene M">M Klene</name>
</author>
<author><name sortKey="Li, X" uniqKey="Li X">X Li</name>
</author>
<author><name sortKey="Knox, Je" uniqKey="Knox J">JE Knox</name>
</author>
<author><name sortKey="Hratchian, Hp" uniqKey="Hratchian H">HP Hratchian</name>
</author>
<author><name sortKey="Cross, Jb" uniqKey="Cross J">JB Cross</name>
</author>
<author><name sortKey="Bakken, V" uniqKey="Bakken V">V Bakken</name>
</author>
<author><name sortKey="Adamo, C" uniqKey="Adamo C">C Adamo</name>
</author>
<author><name sortKey="Jaramillo, J" uniqKey="Jaramillo J">J Jaramillo</name>
</author>
<author><name sortKey="Gomperts, R" uniqKey="Gomperts R">R Gomperts</name>
</author>
<author><name sortKey="Stratmann, Re" uniqKey="Stratmann R">RE Stratmann</name>
</author>
<author><name sortKey="Yazyev, O" uniqKey="Yazyev O">O Yazyev</name>
</author>
<author><name sortKey="Austin, Aj" uniqKey="Austin A">AJ Austin</name>
</author>
<author><name sortKey="Cammi, R" uniqKey="Cammi R">R Cammi</name>
</author>
<author><name sortKey="Pomelli, C" uniqKey="Pomelli C">C Pomelli</name>
</author>
<author><name sortKey="Ochterski, Jw" uniqKey="Ochterski J">JW Ochterski</name>
</author>
<author><name sortKey="Ayala, Py" uniqKey="Ayala P">PY Ayala</name>
</author>
<author><name sortKey="Morokuma, K" uniqKey="Morokuma K">K Morokuma</name>
</author>
<author><name sortKey="Voth, Ga" uniqKey="Voth G">GA Voth</name>
</author>
<author><name sortKey="Salvador, P" uniqKey="Salvador P">P Salvador</name>
</author>
<author><name sortKey="Dannenberg, Jj" uniqKey="Dannenberg J">JJ Dannenberg</name>
</author>
<author><name sortKey="Zakrzewski, Vg" uniqKey="Zakrzewski V">VG Zakrzewski</name>
</author>
<author><name sortKey="Dapprich, S" uniqKey="Dapprich S">S Dapprich</name>
</author>
<author><name sortKey="Daniels, Ad" uniqKey="Daniels A">AD Daniels</name>
</author>
<author><name sortKey="Strain, Mc" uniqKey="Strain M">MC Strain</name>
</author>
<author><name sortKey="Farkas, O" uniqKey="Farkas O">O Farkas</name>
</author>
<author><name sortKey="Malick, Dk" uniqKey="Malick D">DK Malick</name>
</author>
<author><name sortKey="Rabuck, Ad" uniqKey="Rabuck A">AD Rabuck</name>
</author>
<author><name sortKey="Raghavachari, K" uniqKey="Raghavachari K">K Raghavachari</name>
</author>
<author><name sortKey="Foresman, Jb" uniqKey="Foresman J">JB Foresman</name>
</author>
<author><name sortKey="Ortiz, Jv" uniqKey="Ortiz J">JV Ortiz</name>
</author>
<author><name sortKey="Cui, Q" uniqKey="Cui Q">Q Cui</name>
</author>
<author><name sortKey="Baboul, Ag" uniqKey="Baboul A">AG Baboul</name>
</author>
<author><name sortKey="Clifford, S" uniqKey="Clifford S">S Clifford</name>
</author>
<author><name sortKey="Cioslowski, J" uniqKey="Cioslowski J">J Cioslowski</name>
</author>
<author><name sortKey="Stefanov, Bb" uniqKey="Stefanov B">BB Stefanov</name>
</author>
<author><name sortKey="Liu, G" uniqKey="Liu G">G Liu</name>
</author>
<author><name sortKey="Liashenko, A" uniqKey="Liashenko A">A Liashenko</name>
</author>
<author><name sortKey="Piskorz, P" uniqKey="Piskorz P">P Piskorz</name>
</author>
<author><name sortKey="Komaromi, I" uniqKey="Komaromi I">I Komaromi</name>
</author>
<author><name sortKey="Martin, Rl" uniqKey="Martin R">RL Martin</name>
</author>
<author><name sortKey="Fox, Dj" uniqKey="Fox D">DJ Fox</name>
</author>
<author><name sortKey="Keith, T" uniqKey="Keith T">T Keith</name>
</author>
<author><name sortKey="Al Laham, Ma" uniqKey="Al Laham M">MA Al-Laham</name>
</author>
<author><name sortKey="Peng, Cy" uniqKey="Peng C">CY Peng</name>
</author>
<author><name sortKey="Nanayakkara, A" uniqKey="Nanayakkara A">A Nanayakkara</name>
</author>
<author><name sortKey="Challacombe, M" uniqKey="Challacombe M">M Challacombe</name>
</author>
<author><name sortKey="Gill, Pmw" uniqKey="Gill P">PMW Gill</name>
</author>
<author><name sortKey="Johnson, B" uniqKey="Johnson B">B Johnson</name>
</author>
<author><name sortKey="Chen, W" uniqKey="Chen W">W Chen</name>
</author>
<author><name sortKey="Wong, Mw" uniqKey="Wong M">MW Wong</name>
</author>
<author><name sortKey="Gonzalez, C" uniqKey="Gonzalez C">C Gonzalez</name>
</author>
<author><name sortKey="Pople, Ja" uniqKey="Pople J">JA Pople</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schmidt, M" uniqKey="Schmidt M">M Schmidt</name>
</author>
<author><name sortKey="Baldridge, K" uniqKey="Baldridge K">K Baldridge</name>
</author>
<author><name sortKey="Boatz, J" uniqKey="Boatz J">J Boatz</name>
</author>
<author><name sortKey="Elbert, S" uniqKey="Elbert S">S Elbert</name>
</author>
<author><name sortKey="Gordon, M" uniqKey="Gordon M">M Gordon</name>
</author>
<author><name sortKey="Jensen, J" uniqKey="Jensen J">J Jensen</name>
</author>
<author><name sortKey="Koseki, S" uniqKey="Koseki S">S Koseki</name>
</author>
<author><name sortKey="Matsunaga, N" uniqKey="Matsunaga N">N Matsunaga</name>
</author>
<author><name sortKey="Nguyen, K" uniqKey="Nguyen K">K Nguyen</name>
</author>
<author><name sortKey="Ssu, T" uniqKey="Ssu T">T SSu</name>
</author>
<author><name sortKey="Windus, Dupuism" uniqKey="Windus D">DupuisM Windus</name>
</author>
<author><name sortKey="Montgomery, J" uniqKey="Montgomery J">J Montgomery</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Guest, Mf" uniqKey="Guest M">MF Guest</name>
</author>
<author><name sortKey="Bush, Ij" uniqKey="Bush I">IJ Bush</name>
</author>
<author><name sortKey="Van Dam, Hjj" uniqKey="Van Dam H">HJJ Van Dam</name>
</author>
<author><name sortKey="Sherwood, P" uniqKey="Sherwood P">P Sherwood</name>
</author>
<author><name sortKey="Thomas, Jmh" uniqKey="Thomas J">JMH Thomas</name>
</author>
<author><name sortKey="Van, Lenthe" uniqKey="Van L">Lenthe Van</name>
</author>
<author><name sortKey="Havenith, Rwa" uniqKey="Havenith R">RWA Havenith</name>
</author>
<author><name sortKey="Kendrick, J" uniqKey="Kendrick J">J Kendrick</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Song, J" uniqKey="Song J">J Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wakelin, J" uniqKey="Wakelin J">J Wakelin</name>
</author>
<author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
<author><name sortKey="Tyrrell, S" uniqKey="Tyrrell S">S Tyrrell</name>
</author>
<author><name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author><name sortKey="Rzepa, Hs" uniqKey="Rzepa H">HS Rzepa</name>
</author>
<author><name sortKey="Garcia, A" uniqKey="Garcia A">A García</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bray, T" uniqKey="Bray T">T Bray</name>
</author>
<author><name sortKey="Paoli, J" uniqKey="Paoli J">J Paoli</name>
</author>
<author><name sortKey="Sperberg Mcqueen, Cm" uniqKey="Sperberg Mcqueen C">CM Sperberg-McQueen</name>
</author>
<author><name sortKey="Maler, E" uniqKey="Maler E">E Maler</name>
</author>
<author><name sortKey=" Yergeau, F" uniqKey=" Yergeau F">F Yergeau</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
<author><name sortKey="Rzepa, Hs" uniqKey="Rzepa H">HS Rzepa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
<author><name sortKey="Rzepa, Hs" uniqKey="Rzepa H">HS Rzepa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gkoutos, Gv" uniqKey="Gkoutos G">GV Gkoutos</name>
</author>
<author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
<author><name sortKey="Rzepa, Hs" uniqKey="Rzepa H">HS Rzepa</name>
</author>
<author><name sortKey="Wright, M" uniqKey="Wright M">M Wright</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
<author><name sortKey="Rzepa, Hs" uniqKey="Rzepa H">HS Rzepa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Townsend, J" uniqKey="Townsend J">J Townsend</name>
</author>
<author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
<author><name sortKey="Adams, S" uniqKey="Adams S">S Adams</name>
</author>
<author><name sortKey="Downing, J" uniqKey="Downing J">J Downing</name>
</author>
<author><name sortKey="Townsend, J" uniqKey="Townsend J">J Townsend</name>
</author>
<author><name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
<author><name sortKey="Townsend, J" uniqKey="Townsend J">J Townsend</name>
</author>
<author><name sortKey="Adams, S" uniqKey="Adams S">S Adams</name>
</author>
<author><name sortKey="Phadungsukanan, W" uniqKey="Phadungsukanan W">W Phadungsukanan</name>
</author>
<author><name sortKey="Thomas, J" uniqKey="Thomas J">J Thomas</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="O Oyle, N" uniqKey="O Oyle N">N O’Boyle</name>
</author>
<author><name sortKey="Banck, M" uniqKey="Banck M">M Banck</name>
</author>
<author><name sortKey="James, C" uniqKey="James C">C James</name>
</author>
<author><name sortKey="Morley, C" uniqKey="Morley C">C Morley</name>
</author>
<author><name sortKey="Vandermeersch, T" uniqKey="Vandermeersch T">T Vandermeersch</name>
</author>
<author><name sortKey="Hutchison, G" uniqKey="Hutchison G">G Hutchison</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="O Oyle, N" uniqKey="O Oyle N">N O’Boyle</name>
</author>
<author><name sortKey="Morley, C" uniqKey="Morley C">C Morley</name>
</author>
<author><name sortKey="Hutchison, G" uniqKey="Hutchison G">G Hutchison</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Thompson, Hs" uniqKey="Thompson H">HS Thompson</name>
</author>
<author><name sortKey="Beech, D" uniqKey="Beech D">D Beech</name>
</author>
<author><name sortKey="Maloney, M" uniqKey="Maloney M">M Maloney</name>
</author>
<author><name sortKey="Mendelsohn, N" uniqKey="Mendelsohn N">N Mendelsohn</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
<author><name sortKey="Rzepa, H" uniqKey="Rzepa H">H Rzepa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Holliday, Gl" uniqKey="Holliday G">GL Holliday</name>
</author>
<author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
<author><name sortKey="Rzepa, Hs" uniqKey="Rzepa H">HS Rzepa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kuhn, S" uniqKey="Kuhn S">S Kuhn</name>
</author>
<author><name sortKey="Helmus, T" uniqKey="Helmus T">T Helmus</name>
</author>
<author><name sortKey="Lancashire, Rj" uniqKey="Lancashire R">RJ Lancashire</name>
</author>
<author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
<author><name sortKey="Rzepa, Hs" uniqKey="Rzepa H">HS Rzepa</name>
</author>
<author><name sortKey="Steinbeck, C" uniqKey="Steinbeck C">C Steinbeck</name>
</author>
<author><name sortKey="Willighagen, El" uniqKey="Willighagen E">EL Willighagen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Day, N" uniqKey="Day N">N Day</name>
</author>
<author><name sortKey="Downing, J" uniqKey="Downing J">J Downing</name>
</author>
<author><name sortKey="Adams, S" uniqKey="Adams S">S Adams</name>
</author>
<author><name sortKey="England, Nw" uniqKey="England N">NW England</name>
</author>
<author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Adams, N" uniqKey="Adams N">N Adams</name>
</author>
<author><name sortKey="Winter, J" uniqKey="Winter J">J Winter</name>
</author>
<author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
<author><name sortKey="Rzepa, Hs" uniqKey="Rzepa H">HS Rzepa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bray, T" uniqKey="Bray T">T Bray</name>
</author>
<author><name sortKey="Hollander, D" uniqKey="Hollander D">D Hollander</name>
</author>
<author><name sortKey="Layman, A" uniqKey="Layman A">A Layman</name>
</author>
<author><name sortKey="Tobin, R" uniqKey="Tobin R">R Tobin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Totton, Ts" uniqKey="Totton T">TS Totton</name>
</author>
<author><name sortKey="Shirley, R" uniqKey="Shirley R">R Shirley</name>
</author>
<author><name sortKey="Kraft, M" uniqKey="Kraft M">M Kraft</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="West, Rh" uniqKey="West R">RH West</name>
</author>
<author><name sortKey="Beran, Gjo" uniqKey="Beran G">GJO Beran</name>
</author>
<author><name sortKey="Green, Wh" uniqKey="Green W">WH Green</name>
</author>
<author><name sortKey="Kraft, M" uniqKey="Kraft M">M Kraft</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Shirley, R" uniqKey="Shirley R">R Shirley</name>
</author>
<author><name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author><name sortKey="Totton, Ts" uniqKey="Totton T">TS Totton</name>
</author>
<author><name sortKey="West, Rh" uniqKey="West R">RH West</name>
</author>
<author><name sortKey="Kraft, M" uniqKey="Kraft M">M Kraft</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Shirley, R" uniqKey="Shirley R">R Shirley</name>
</author>
<author><name sortKey="Phadungsukanan, W" uniqKey="Phadungsukanan W">W Phadungsukanan</name>
</author>
<author><name sortKey="Kraft, M" uniqKey="Kraft M">M Kraft</name>
</author>
<author><name sortKey="Downing, J" uniqKey="Downing J">J Downing</name>
</author>
<author><name sortKey="Day, Ne" uniqKey="Day N">NE Day</name>
</author>
<author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Phadungsukanan, W" uniqKey="Phadungsukanan W">W Phadungsukanan</name>
</author>
<author><name sortKey="Shekar, S" uniqKey="Shekar S">S Shekar</name>
</author>
<author><name sortKey="Shirley, R" uniqKey="Shirley R">R Shirley</name>
</author>
<author><name sortKey="Sander, M" uniqKey="Sander M">M Sander</name>
</author>
<author><name sortKey="West, Rh" uniqKey="West R">RH West</name>
</author>
<author><name sortKey="Kraft, M" uniqKey="Kraft M">M Kraft</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Berglund, A" uniqKey="Berglund A">A Berglund</name>
</author>
<author><name sortKey="Boag, S" uniqKey="Boag S">S Boag</name>
</author>
<author><name sortKey="Chamberlin, D" uniqKey="Chamberlin D">D Chamberlin</name>
</author>
<author><name sortKey="Fernandez, Mf" uniqKey="Fernandez M">MF Fernández</name>
</author>
<author><name sortKey="Kay, M" uniqKey="Kay M">M Kay</name>
</author>
<author><name sortKey="Robie, J" uniqKey="Robie J">J Robie</name>
</author>
<author><name sortKey="Simeon, J" uniqKey="Simeon J">J Siméon</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kay, M" uniqKey="Kay M">M Kay</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bradner, S" uniqKey="Bradner S">S Bradner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Manola, F" uniqKey="Manola F">F Manola</name>
</author>
<author><name sortKey="Miller, E" uniqKey="Miller E">E Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Prud Ommeaux, E" uniqKey="Prud Ommeaux E">E Prud’hommeaux</name>
</author>
<author><name sortKey="Seaborne, A" uniqKey="Seaborne A">A Seaborne</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en"><pmc-dir>properties open_access</pmc-dir>
<front><journal-meta><journal-id journal-id-type="nlm-ta">J Cheminform</journal-id>
<journal-id journal-id-type="iso-abbrev">J Cheminform</journal-id>
<journal-title-group><journal-title>Journal of Cheminformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1758-2946</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">22870956</article-id>
<article-id pub-id-type="pmc">3434037</article-id>
<article-id pub-id-type="publisher-id">1758-2946-4-15</article-id>
<article-id pub-id-type="doi">10.1186/1758-2946-4-15</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Database</subject>
</subj-group>
</article-categories>
<title-group><article-title>The semantics of Chemical Markup Language (CML) for computational chemistry : CompChem</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" id="A1"><name><surname>Phadungsukanan</surname>
<given-names>Weerapong</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>wp214@cam.ac.uk</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A2"><name><surname>Kraft</surname>
<given-names>Markus</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>mk306@cam.ac.uk</email>
</contrib>
<contrib contrib-type="author" id="A3"><name><surname>Townsend</surname>
<given-names>Joe A</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>jat45@jatownsend.net</email>
</contrib>
<contrib contrib-type="author" id="A4"><name><surname>Murray-Rust</surname>
<given-names>Peter</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>pm286@cam.ac.uk</email>
</contrib>
</contrib-group>
<aff id="I1"><label>1</label>
Department of Chemical Engineering and Biotechnology, University of Cambridge, New Museums Site, Pembroke Street, Cambridge, CB2 3RA, UK</aff>
<aff id="I2"><label>2</label>
Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK</aff>
<pub-date pub-type="collection"><year>2012</year>
</pub-date>
<pub-date pub-type="epub"><day>7</day>
<month>8</month>
<year>2012</year>
</pub-date>
<volume>4</volume>
<fpage>15</fpage>
<lpage>15</lpage>
<history><date date-type="received"><day>23</day>
<month>3</month>
<year>2012</year>
</date>
<date date-type="accepted"><day>20</day>
<month>6</month>
<year>2012</year>
</date>
</history>
<permissions><copyright-statement>Copyright ©2012 Phadungsukanan et al.; licensee Chemistry Central Ltd.</copyright-statement>
<copyright-year>2012</copyright-year>
<copyright-holder>Phadungsukanan et al.; licensee Chemistry Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0"><license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.jcheminf.com/content/4/1/15"></self-uri>
<abstract><sec><title></title>
<p>This paper introduces a subdomain chemistry format for storing computational chemistry data called CompChem. It has been developed based on the design, concepts and methodologies of Chemical Markup Language (CML) by adding computational chemistry semantics on top of the CML Schema. The format allows a wide range of <italic>ab initio</italic>
quantum chemistry calculations of individual molecules to be stored. These calculations include, for example, single point energy calculation, molecular geometry optimization, and vibrational frequency analysis. The paper also describes the supporting infrastructure, such as processing software, dictionaries, validation tools and database repositories. In addition, some of the challenges and difficulties in developing common computational chemistry dictionaries are discussed. The uses of CompChem are illustrated by two practical applications.</p>
</sec>
</abstract>
</article-meta>
</front>
<body><sec><title>Background</title>
<sec><title>Introduction</title>
<p>Computational Quantum Chemistry is a very popular area of research today and will be even more popular in the future. This is due to several emerging key technologies. Developments in computational quantum theory, better numerical methods, as well as parallel and distributed computing, have significantly reduced the computational time (from months to days or hours). With software packages such as Gaussian [<xref ref-type="bibr" rid="B1">1</xref>
], GAMESS (US) [<xref ref-type="bibr" rid="B2">2</xref>
], and GAMESS-UK [<xref ref-type="bibr" rid="B3">3</xref>
] properties of large or short-lived molecules can be calculated which may be difficult or impossible to obtain experimentally. Increasingly, this is done with little human intervention, as automated chemical model generators are becoming more and more popular [<xref ref-type="bibr" rid="B4">4</xref>
]. As a consequence the amount of data available will very soon become too vast to be analyzed manually. Regardless of how advanced the technology is, these calculations will always require resources which may be wasted if somebody else has completed the same calculation already. For this reason efficient storage and retrieval of computational chemistry data is an important issue. To address this issue the development of an easily accessible and usable infrastructure is necessary.</p>
<p>At present, most computational results are output as “log files” which are designed to record information as human-readable plain text. The log files contain not only information about the calculated properties, but also metadata, such as computing environments, errors, warnings, etc. Many crucial pieces of information, such as units, computational methods or algorithms, are usually omitted from the outputs because they are often considered to be “obvious” [<xref ref-type="bibr" rid="B5">5</xref>
] or are provided in separate documentation. Moreover, the structure of the log files depends on the software used, which creates difficulties in retrieving textual information among the different formats. This impedes the automation of the data analysis which is essential in the study of a large chemical system.</p>
<p>A typical solution to the problem is to extract the information from the log files (known as “parsing”) and cast them into a format that is more efficient for retrieval and processing. The eXtensible Markup Language [<xref ref-type="bibr" rid="B6">6</xref>
] (XML) is usually selected for storing data due to its universality and extensibility for both simple and complex data. Furthermore, XML provides the means for checking conformance of the structure and data ensuring that the XML instances meet the requirements of the application in question. The fact that XML has become an industrial standard for data storage, in addition to the fact that most modern software is built to support it, are the strongest testaments to its usefulness.</p>
<p>For chemistry applications, the Chemical Markup Language (CML) [<xref ref-type="bibr" rid="B7">7</xref>
-<xref ref-type="bibr" rid="B10">10</xref>
] has been developed based on the XML standard in order to provide the semantics for chemical data. CML allows the representation of complex chemical objects by using the hierarchical tree structure of XML. In addition, CML is accompanied by a number of methodologies [<xref ref-type="bibr" rid="B11">11</xref>
-<xref ref-type="bibr" rid="B13">13</xref>
] and infrastructures, such as CMLXOM [<xref ref-type="bibr" rid="B14">14</xref>
], Jumbo6 [<xref ref-type="bibr" rid="B15">15</xref>
], Jumbo-Converter [<xref ref-type="bibr" rid="B16">16</xref>
] and CMLValidator [<xref ref-type="bibr" rid="B17">17</xref>
], which support the development of a more general computational chemistry format. The following features make CML specifically suited for our purpose: </p>
<p>1. CML contains a set of hundreds of chemical name tags covering all aspects of chemistry and so allows one to compose a suitable representation for any chemical data;</p>
<p>2. CML is widely supported by chemistry software, such as, OpenBabel [18], PyBel [19], Jmol [20], Avogadro [21], making it easy to integrate a subdomain format of CML into most of the existing systems which use these libraries with little modification;</p>
<p>3. CML has been developed over 15 years so the terminology, concepts and semantics have become highly stable, complete and well understood with relatively small changes in its schema and, as a result, it has been accepted by the chemistry community.</p>
<p>The <bold>purpose of this paper</bold>
is to use CML to develop a standard called CompChem, which is suitable to represent computational chemistry information, including a set of supporting open-source tools. Furthermore, we illustrate the use of CompChem for managing computational chemistry data and for calculating thermodynamic properties.</p>
<p>The paper is structured as follows. We briefly review the important CML concepts used throughout this paper in section “CML overview”. In section “Methodology in CompChem”, we describe the requirements for the design of CompChem and the semantics and the detailed specification of CompChem. Finally, in section “Utility : example use cases”, we report a recent application with examples.</p>
</sec>
<sec><title>CML overview</title>
<p>In this section, we briefly outline the key CML concepts and terminologies, which are adopted by CompChem, for readers who are not familiar with CML. Detailed discussions have already been published in <italic>Murray-Rust et al.</italic>
[<xref ref-type="bibr" rid="B13">13</xref>
] and <italic>Townsend et al.</italic>
[<xref ref-type="bibr" rid="B11">11</xref>
]. The latest information of the ongoing developments are also publicly available online at <ext-link ext-link-type="uri" xlink:href="http://www.xml-cml.org">http://www.xml-cml.org</ext-link>
. The development of CompChem is based on the following components and concepts: </p>
<p>· XML Schema [22] is an XML-based schema language which specifies the constraints on the structure of an XML document. It is also written in XML and referred to as XML Schema Definition (XSD). The term “XML Schema” (with a capital “S”) should not be confused with XML schema. The latter is a term describing schema languages in general. XML Schema is one of the most commonly used schema languages today. It was published as a W3C recommendation in 2001 [23] to replace Document Type Definition (DTD) and provide additional features for defining the constraints and validating the contents of XML document.</p>
<p>· CML Schema [10, 24] is an XML Schema containing hundreds of chemical definitions (XML tags and attributes). It covers most aspects of chemistry, e.g., CMLReact [25] for chemical reactions, CMLSpec [26] for spectral data, CML for crystallography [27] and CML for polymers (PML) [28]. With the CML Schema, one can determine if a CML document conforms to the specification or not. For example, the schema will tell whether a CML document contains a misspelled element name or an undefined attribute. This ensures that the applications will not generate any errors due to using a “bad” CML document as their input. In the latest version of CML Schema (version 3), the content model restrictions have been lifted in order to make it more flexible for creating any type of chemical documents.</p>
<p>· CML Convention is a set of rules and constraints on the content model of a CML document. It is a subset of the CML Schema with some additional rules for a specific chemistry domain, some of which cannot be defined using XSD. When a convention is specified on a CML element (using the @convention attribute), the structure of the element must conform to the rules defined by the convention. The convention is represented by a short-hand notation, known as a qualify name (QName [29]), which represents a globally unique Uniform Resource Locator (URL).</p>
<p>· CML Dictionary is a collection of “controlled vocabularies” which are used to add semantics to generic CML elements, especially for and . There are several types of CML dictionaries, for example, property and parameter dictionaries (specified using @dictRef), unit dictionaries (specified using @unit) and unit type dictionaries (specified using @unitType). The existing dictionaries can be found at http://www.xml-cml.org/dictionary/.</p>
<p>· Validation is the most important step to verify whether a CML document conforms to the structure required by your application. The CML approach to validation [11] consists of several steps, e.g., CML Schema, CML convention, CML dictionary validations, and so on. These are usually performed sequentially (as shown in Figure 1), however, they are completely independent. A sophisticated online validator is available at http://validator.xml-cml.org/.</p>
<fig id="F1" position="float"><label>Figure 1</label>
<caption><p>A linear schematic diagram of validation process for CompChem.</p>
</caption>
<graphic xlink:href="1758-2946-4-15-1"></graphic>
</fig>
</sec>
</sec>
<sec><title>Methodology in CompChem</title>
<sec><title>CompChem design</title>
<p>The development of CompChem started back in the summer of 2009 with the initial goal of archiving our published computational quantum chemistry results [<xref ref-type="bibr" rid="B30">30</xref>
-<xref ref-type="bibr" rid="B34">34</xref>
], which were calculated using the convenient software Gaussian 03, in a machine readable format and stored in a queriable database for automating the studies of chemical reactions in a combustion system. It was a collaborative effort between chemical engineers and cheminformatic scientists to explore the power of Semantic Web technologies for storing scientific data. The format was developed purely using the existing CML without making any modification to its schema. The number of elements we use in CompChem, see sections “CompChem convention” and 2, is currently relatively small compared to the whole set of CML elements available, but it is sufficient for most of the data that needs to be stored in the current work. It is very likely that other CML elements will be included to support other functionalities in later years as CompChem evolves.</p>
<p>Like other XML standards, the CompChem convention can only work well if it is widely accepted and, until now, there has not been one for computational chemistry, due to the varied nature of studies. This is a fact that we have to accept and, therefore, we only focus on formalizing the data calculated from the quantum chemistry software in this work.</p>
<p>The design of the CompChem convention shares and inherits the common goals of CML, Polymer Markup Language (PML) and other XML standards, which are quoted from XML 1.0 W3C Recommendation [<xref ref-type="bibr" rid="B6">6</xref>
]. (Readers are advised to read this documentation for further details) These are as follows: </p>
<p>1. CompChem shall be straightforwardly usable over the Internet;</p>
<p>2. CompChem shall support a wide variety of applications;</p>
<p>3. CompChem shall be compatible with Standard Generalized Markup Language (SGML);</p>
<p>4. It shall be easy to write programs which process CompChem documents;</p>
<p>5. The number of optional features in CompChem is to be kept to the absolute minimum, ideally zero;</p>
<p>6. CompChem documents should be human-legible and reasonably clear;</p>
<p>7. The CompChem design should be prepared quickly;</p>
<p>8. The design of CompChem shall be formal and concise;</p>
<p>9. CompChem documents shall be easy to create;</p>
<p>10. Terseness in CompChem markup is of minimal importance.</p>
<p>Apart from these general goals, there are more specific goals which distinguish CompChem from CML and other XML standards: </p>
<p>1. CompChem should be based on CML and reuse its components where appropriate. This is a typical goal of all subdomain formats of CML. Reusing CML and its components is the fundamental key to improve the quality and consistency of the format and reduce development cost and effort. In addition, any future improvement made into CML and its technologies will also be immediately applied to CompChem. In the development of CompChem, we introduced no new components into the CML Schema. Instead, the new concepts are defined using CML dictionaries and are applied to generic CML containers, see Section “Using dictionary in CompChem”.</p>
<p>2. CompChem should capture the semantics of most computational chemistry calculations. This is the main goal of our work. It is to reduce the flexibility in CML Schema and introduce a stricter structure into the documents so that software and applications know exactly how to process the information. The semantics of CompChem is modelled based on the typical nature of computational simulations or calculations, i.e., contains model input and output steps, see Section “CompChem convention”.</p>
<p>3. CompChem shall support any chemical data. CML provides a rich set of chemical data types in addition to standard XML data types. It is also possible to build more complex chemical objects from the abstract CML data types and components, thus, CompChem has gained this advantages from reusing CML.</p>
<p>4. CompChem should be able to be validated using standard processing tools. This is an important consideration to make the CompChem platform independent. The development of CompChem involves using both CML components and CML technologies. The CML components, i.e., CML elements and attributes, are validated using CML Schema and any standard XML Schema processor. The XML stylesheet, XPath [35] and XSLT [36] are chosen for implementing and validating the CML conventions. Therefore, one should be able to validate the CompChem convention by using any web browser capable of rendering XSLT.</p>
<p>5. CompChem should represent both computational input and output. CompChem is designed to be used as both input and output for the calculations. The computation input contains critical information, such as calculation model, basis set, level of theory, job type, etc., that defines the calculation itself. This information is required for the search functionality of the digital repository and the calculation output is usually what is returned from the search. Being able to store input and output are required features of CompChem.</p>
<p>6. CompChem should interoperate with other XML or CML models (conventions). This is one of the common goals that is shared by all CML works. Interoperability is a requirement for CompChem to be used in conjunction with other existing XML-based formats such as Dublin CoreⒸMetadata (DCMI) and Object Reuse and Exchange (OAI-ORE) standards. This makes CompChem not only reuse the CML components but also other well established formats.</p>
<p>7. CompChem shall allow users to define and insert new concepts. As discussed earlier, new concepts are added into CompChem through the use of a dictionary mechanism. This is not only applied to the basic values, such as , , @unit and @unitType, but also the complex model objects. It is feasible to insert an entire new convention into CompChem, although, it may not be understood by all standard chemistry tools.</p>
<p>8. CompChem Convention rules must be clear and well documented. Although the convention rules are implemented into the CompChem convention validator using stylesheets, it is important that there must also be human readable documentation. Clear documentation benefits both users and developers in the long term. We will adhere to this in all of our development. In practice, we make the decisions on what are the rules that should be in CompChem and then write documentation from these rules. After that, we implement the rules into the convention validator. This discipline ensures that there is always documentation for every convention we develop.</p>
</sec>
<sec><title>Using dictionary in CompChem</title>
<p>Because dictionaries play a central role in defining the semantics within a CompChem document, it is essential to fully understand the concepts and how the dictionary referencing mechanism works. Both are explained in detail in this section.</p>
<p>Concepts are the building blocks of scientific knowledge. In natural language, similar concepts can be expressed using several words or synonyms which are the common causes of ambiguity, confusion and error when the information is being processed. In software development, several similar concepts or synonyms can be grouped and represented by a carefully pre-determined term or vocabulary, commonly known as <italic>controlled vocabulary</italic>
. Using controlled vocabulary, one can impose an order and reduce ambiguity by allowing the same concepts to be labelled using a single unique term.</p>
<p>In XML, the tags and attributes are predetermined terms, in other words, an XML schema is a set of controlled vocabularies. CML is no exception. The CML elements and attributes are predefined to cover almost all general aspects of chemistry and computational chemistry. However, it is impossible and futile to predefine every possible chemistry concept into CML. For example, concepts like boiling point, melting point, basis set, entropy, enthalpy, methodology, algorithm, etc., are not included in the CML Schema. Instead, CML uses a dictionary and a referencing mechanism to specify a new concept on the generic CML containers, such as <monospace></monospace>
, <monospace></monospace>
, <monospace></monospace>
, <monospace></monospace>
, etc., which can be used to hold the values of any types.</p>
<p>A new concept can be added as an entry into a CML dictionary without requiring the CML Schema to be modified. The dictionary referencing mechanism consists of 3 steps; <bold>defining the new concept</bold>
, <bold>creating a reference</bold>
to the defined concept and <bold>applying the reference</bold>
to the CML generic container. </p>
<p>· Defining a new concept. In Figure 2 (1), we show a snippet of a CML dictionary which is created according to the CML dictionary convention. A dictionary can contain multiple child elements of entries allowing the vocabulary in the same category to be grouped as one set. The figure only briefly illustrates how a dictionary and its vocabulary should be defined so readers are strongly advised to read the latest detailed specifications of the dictionary convention on www.xml-cml.orgfor more information.</p>
<p>· Creating a reference to the defined concept. In CML, a qualify name (QName) [29] is used to identify an entry in the dictionary. A QName contains a namespace URI [29], a local part and a prefix. The prefix is only used as a placeholder for the associated namespace URI and is declared in a namespace declaration. Therefore, in order to be able to identify the dictionary, each dictionary must have a unique identifier and it is specified using @namespace on . This is not to be confused with the XML namespace which is denoted by @xmlns. Specifying @namespace on does not change the actual XML namespace of ; it remains in the CML namespace (http://www.xml-cml.org/schema).Each entry must have a unique @id (unique within the dictionary) and this is used as the local part of the QName. The combination of the dictionary @namespace and entry @id generates a globally unique reference for the defined concept. In Figure 2 (2), the prefix “cc” is associated to the same URI (http://www.xml-cml.org/dictionary/compchem/) that is declared for the CompChem-core dictionary’s @namespace. Using the entry id “job”, a QName “cc:job” is constructed as a reference in this step.</p>
<p>· Applying the reference. The reference or QName can be applied to a container using @dictRef, shown in Figure 2 (3).</p>
<fig id="F2" position="float"><label>Figure 2</label>
<caption><p><bold>Diagram illustrating the dictionary referencing mechanism using @dictRef in 3 steps.</bold>
A snippet of the dictionary and its entry are shown in the top (orange) box and a snippet of CompChem job module is show in the bottom (blue) box.</p>
</caption>
<graphic xlink:href="1758-2946-4-15-2"></graphic>
</fig>
<p>This referencing mechanism is not only applied to <monospace>@dictRef</monospace>
but also <monospace>@units</monospace>
, <monospace>@unitType</monospace>
and other attributes. Although the mechanisms are similar, the unit and unit type dictionaries are not defined using <monospace></monospace>
but rather <monospace></monospace>
and <monospace></monospace>
respectively. This is because the unit and unit type are common concepts for scientific data so it has been defined in the CML Schema.</p>
</sec>
<sec><title>CompChem convention</title>
<p>According to our design criteria that CompChem convention should capture the typical underlying processes of quantum calculations and their relationships, the proposed architecture described here is broad and may be applied to any computational modeling in general. The core concepts of CompChem contain the following components:</p>
<p>1. Job list(jobList) In computational quantum chemistry, calculations are often comprised of a series of subtasks, e.g., coarse optimization → fine optimization → NMR Spectrum Analysis. Each job performs a different type of calculation and passes the results to the next calculation job; this is because most quantum chemistry software packages are designed to be modularized and only to perform a single task at a time. The jobList concept is introduced to capture this series of successive subtasks and links the information from one subtask to the next subtask. It behaves like a wrapper for job modules.</p>
<p>2. Job(job) The job concept represents a computational job or a computer simulation task, e.g., geometry optimization and frequency analysis jobs, performed by quantum chemistry software. The job concept is the smallest module that fully describes an overall picture of a computational modeling unit. It consists of model parameters (initialization) and model optimizations or calculations (calculation), model results (finalization) and computing environments (environment). These four components are fundamental to every simulation. However, it is not required that all four components be present in every job. Only model parameters are mandatory. A module that contains only model parameters may be used as an abstract quantum chemistry input.</p>
<p>3. Model initialization(initialization) The model initialization concept represents the model parameters and inputs for a computational job. The model parameters are one of the most important elements that exist in every modeling study. Therefore, it is required in the CompChem convention.</p>
<p>4. Model calculation(calculation) A model calculation concept represents the computation, the optimization or the iteration processes for the computational job specified by the initialization. The calculation process may or may not be of interest to some scientists; therefore, it is an optional information in CompChem.</p>
<p>5. Model finalization(finalization) A model finalization concept represents the model output or result of a computational job. In some cases, a CompChem module may only represent the model inputs and does not contain any calculations, therefore, it is optional in CompChem.</p>
<p>6. Computing environment(environment) The computing environment concept refers to the configuration settings with respect to the hardware platform, software application and operating system. The environment also includes metadata such as machine id, username, starting and finishing date time, tools, compilers, and Internet Protocol address (IP address).</p>
<p>7. User defined concept CompChem allows users to define their own concepts if the recommended concepts above do not fit into their requirements. A user defined concept in CompChem is represented by a module element with a @dictRef attribute whose value points to an entry in a dictionary that defines the concept. Users are free to design any structure for a user defined module. However, it is recommended to use existing structures or a structure that has a schema for validation. Information in a user defined module cannot be guaranteed to be understandable by all processing software tools.</p>
<p>Each concept, defined above, is associated with the core CompChem dictionary (available at <ext-link ext-link-type="uri" xlink:href="http://www.xml-cml.org/dictionary/compchem/">http://www.xml-cml.org/dictionary/compchem/</ext-link>
), whose <monospace>@dictRef</monospace>
s and rules are given in Table <xref ref-type="table" rid="T1">1</xref>
. The rules in this table are coded into a stylesheet which can be used to validate a CompChem document. It is anticipated that the rules need to be modified or extended when more complex calculations, such as transition state searches or molecular dynamic simulations are included in CompChem.</p>
<table-wrap position="float" id="T1"><label>Table 1</label>
<caption><p>Rules of CompChem</p>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col></col>
<col></col>
</colgroup>
<thead><tr><th align="left"><bold>dictRef.</bold>
</th>
<th align="center"><bold>Rules</bold>
</th>
</tr>
</thead>
<tbody><tr><td align="justify" valign="bottom"><monospace>cc:jobList</monospace>
<hr></hr>
</td>
<td align="justify" valign="bottom">- A jobList module element MUST have an id attribute the value of which MUST be unique within the module specifying the compchem convention.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A jobList module element MUST contain at least one job module child element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A jobList module element SHOULD have a title attribute the value of which MUST be a non-empty string specifying a human-readable title for the module.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A jobList module element MAY contain more than one child element in any namespace.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"><monospace>cc:job</monospace>
<hr></hr>
</td>
<td align="justify" valign="bottom">- A job module element MUST contain exactly one initialization module child element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A job module element MAY contain zero or more calculation module child elements.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A job module element MAY contain no more than one finalization module child element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A job module element MAY contain no more than one environment module element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- The order of the calculation module elements in a job module MUST represent the order of the calculation steps but there is no restriction on the order of other child element types.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- If a calculation module element is present, a finalization module element MUST also be present as a child of a job module element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A job module element SHOULD have a title attribute, the value of which MUST be a non-empty string specifying a human-readable title for the module.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A job module element MAY also contain other child elements in any namespace.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"><monospace>cc:initialization</monospace>
<hr></hr>
</td>
<td align="justify" valign="bottom">- An initialization module element MUST NOT contain more than one <monospace></monospace>
child element. The <monospace></monospace>
MUST specify a convention using the convention attribute and the convention SHOULD be one of the RECOMMENDED molecular conventions.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- An initialization module element MUST NOT contain more than one <monospace></monospace>
element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- An initialization module element MAY contain any number of user defined module element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- An initialization module element MUST contain at least one child of molecule, <monospace></monospace>
or user defined module elements.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- An initialization module element MAY contain more than one child element in any namespace but MUST NOT contain a property child element or a <monospace></monospace>
child element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A job module element MAY also contain other child elements in any namespace.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"><monospace>cc:initialization</monospace>
<hr></hr>
</td>
<td align="justify" valign="bottom">- An initialization module element MUST NOT contain more than one <monospace></monospace>
child element. The <monospace></monospace>
MUST specify a convention using the convention attribute and the convention SHOULD be one of the RECOMMENDED molecular conventions.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- An initialization module element MUST NOT contain more than one <monospace></monospace>
element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- An initialization module element MAY contain any number of user defined module element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- An initialization module element MUST contain at least one child of molecule, <monospace></monospace>
or user defined module elements.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- An initialization module element MAY contain more than one child element in any namespace but MUST NOT contain a property child element or a <monospace></monospace>
child element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"><monospace>cc:calculation</monospace>
<hr></hr>
</td>
<td align="justify" valign="bottom">- A calculation module element MUST NOT contain more than one molecule child element. The molecule MUST specify a convention using the convention attribute and the convention SHOULD be one of the RECOMMENDED molecular conventions.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A calculation module element MUST NOT contain more than one <monospace></monospace>
element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A calculation module element MUST NOT contain more than one <monospace></monospace>
element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A calculation module element MAY contain any number of user defined module elements.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A calculation module element MUST contain at least one child of molecule, <monospace></monospace>
, <monospace></monospace>
or user defined module elements.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A calculation module element MAY contain more than one child element in any namespace.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"><monospace>cc:finalization</monospace>
<hr></hr>
</td>
<td align="justify" valign="bottom">- A finalization module element MUST NOT contain more than one molecule child element. The molecule MUST specify a convention using the convention attribute and the convention SHOULD be one of the RECOMMENDED molecular conventions.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A finalization module element MUST NOT contain more than one <monospace></monospace>
element.<hr></hr>
</td>
</tr>
<tr><td align="justify" valign="bottom"> <hr></hr>
</td>
<td align="justify" valign="bottom">- A finalization module element MAY contain any number of user defined module elements.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">- A finalization module element MUST contain at least one molecule child, <monospace></monospace>
child or user defined module element.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">- A finalization module element MAY contain more than one child element in any namespace but MUST NOT contain a parameter child element or a <monospace></monospace>
child element.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"><monospace>cc:environment</monospace>
<hr></hr>
</td>
<td align="left" valign="bottom">- An environment module element MUST NOT contain more than one <monospace></monospace>
element.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">- Any environment property element MUST be a child of a <monospace></monospace>
element.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">- An environment module element MAY contain more than one child element in any namespace including any number of user defined module elements. However, CompChem can only understand a particular set of concepts.<hr></hr>
</td>
</tr>
<tr><td align="left" valign="bottom"> <hr></hr>
</td>
<td align="left" valign="bottom">- An environment module MUST contain at least one child of <monospace></monospace>
or userDefinedModule elements.<hr></hr>
</td>
</tr>
<tr><td align="left"> </td>
<td align="left">- An environment module element MAY contain more than one child element in any namespace but MUST NOT contain a parameter child element or a <monospace></monospace>
child element.</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 [<xref ref-type="bibr" rid="B37">37</xref>
].</p>
</table-wrap-foot>
</table-wrap>
<p>Figure <xref ref-type="fig" rid="F3">3</xref>
shows a snippet of a CompChem document with the key features labeled accordingly.</p>
<fig id="F3" position="float"><label>Figure 3</label>
<caption><p>The structure of CML for storing computational chemistry output: (1) CompChem convention declaration, (2) CML convention namespace, (3) a jobList module, (4) a job module, (5) an initialization module, (6) Molecular convention declaration, (7) a basis set parameter specified by cc:basis dictionary reference, (8) a Gaussian specific parameter declared in Gaussian dictionary, (9) a finalization module, (10) si:none for dimensionless units, (11) CML identifier.</p>
</caption>
<graphic xlink:href="1758-2946-4-15-3"></graphic>
</fig>
</sec>
<sec><title>Semantics of properties and parameters</title>
<p>There is a core set of CML which is required for storing the actual contents and data. Since CML Schema are content model free, it is necessary to precisely define how the elements should be used. In this section, we list and describe the CML elements which are often found to be useful in CompChem documents. The rules given here for these components are meant to serve only as a guideline for using the common CML components, such as <monospace></monospace>
, <monospace></monospace>
, <monospace></monospace>
, <monospace></monospace>
, and <monospace></monospace>
. If the given rules are not applicable, users are allowed to define their own structures and annotate it with their own dictionary reference using the <monospace>@dictRef</monospace>
attribute. However, the new structures should be clearly specified and documented in the user dictionary so that anyone is able to write a code that can process the dictionary.</p>
<sec><title>Parameter and property containers</title>
<p>A container is a general notion for an XML element that contains data. The CompChem element parameter is also a container. The exact definition of parameter depends on the context where it is used. In the context of CompChem, parameters are a set of model conditions which can be numerical quantities, options, constraints, text or any chemical objects, for example, a basis set (e.g., 6-311+G(d,p)), level of theory, convergence criteria, calculation type (e.g., geometry optimization, frequency analysis, NMR). Some values can be enumerated. For example, Gaussian 03/09 [<xref ref-type="bibr" rid="B1">1</xref>
] may need to know whether it should use symmetry in the wave function or not. This option can be set to only either “NoSymm” or “Symm” according to the online manual for Gaussian software [<xref ref-type="bibr" rid="B1">1</xref>
] and this can be pre-enumerated for use in a CompChem document with values “On” or “Off”.</p>
<p>In CompChem, a value cannot be added directly as a text child of a parameter. It must be wrapped by a CML primitive data container, see Section “Data containers”, which is usually one of <monospace></monospace>
, <monospace></monospace>
or <monospace></monospace>
. For plain text, a scalar should be used. This allows the computer software to understand exactly which variable type (i.e., variable type in programming language) is suitable for the value of a given parameter. In many cases, a primitive container is not sufficient and it requires a complex object representation to hold the data. Figure <xref ref-type="fig" rid="F4">4</xref>
shows examples of both primitive and complex chemistry objects. In Figure <xref ref-type="fig" rid="F4">4</xref>
(b), we illustrate a complex object using <monospace>