OrangerV1, Pmc, Corpus, bibRecord, 000E800

***** Acces problem to record *****\

Identifieur interne : 000E800 ( Pmc/Corpus ); précédent : 000E799; suivant : 000E801 ***** probable Xml problem with record *****

Links to Exploration step

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">PlantCAZyme: a database for plant carbohydrate-active enzymes</title>
<author><name sortKey="Ekstrom, Alexander" sort="Ekstrom, Alexander" uniqKey="Ekstrom A" first="Alexander" last="Ekstrom">Alexander Ekstrom</name>
<affiliation><nlm:aff wicri:cut=" and" id="bau079-AFF1">Department of Computer Science</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Taujale, Rahil" sort="Taujale, Rahil" uniqKey="Taujale R" first="Rahil" last="Taujale">Rahil Taujale</name>
<affiliation><nlm:aff id="bau079-AFF1">Department of Biological Sciences, Northern Illinois University, DeKalb, IL 60115, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Mcginn, Nathan" sort="Mcginn, Nathan" uniqKey="Mcginn N" first="Nathan" last="Mcginn">Nathan Mcginn</name>
<affiliation><nlm:aff wicri:cut=" and" id="bau079-AFF1">Department of Computer Science</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Yin, Yanbin" sort="Yin, Yanbin" uniqKey="Yin Y" first="Yanbin" last="Yin">Yanbin Yin</name>
<affiliation><nlm:aff id="bau079-AFF1">Department of Biological Sciences, Northern Illinois University, DeKalb, IL 60115, USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">25125445</idno>
<idno type="pmc">4132414</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4132414</idno>
<idno type="RBID">PMC:4132414</idno>
<idno type="doi">10.1093/database/bau079</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000E80</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">PlantCAZyme: a database for plant carbohydrate-active enzymes</title>
<author><name sortKey="Ekstrom, Alexander" sort="Ekstrom, Alexander" uniqKey="Ekstrom A" first="Alexander" last="Ekstrom">Alexander Ekstrom</name>
<affiliation><nlm:aff wicri:cut=" and" id="bau079-AFF1">Department of Computer Science</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Taujale, Rahil" sort="Taujale, Rahil" uniqKey="Taujale R" first="Rahil" last="Taujale">Rahil Taujale</name>
<affiliation><nlm:aff id="bau079-AFF1">Department of Biological Sciences, Northern Illinois University, DeKalb, IL 60115, USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Mcginn, Nathan" sort="Mcginn, Nathan" uniqKey="Mcginn N" first="Nathan" last="Mcginn">Nathan Mcginn</name>
<affiliation><nlm:aff wicri:cut=" and" id="bau079-AFF1">Department of Computer Science</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Yin, Yanbin" sort="Yin, Yanbin" uniqKey="Yin Y" first="Yanbin" last="Yin">Yanbin Yin</name>
<affiliation><nlm:aff id="bau079-AFF1">Department of Biological Sciences, Northern Illinois University, DeKalb, IL 60115, USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">Database: The Journal of Biological Databases and Curation</title>
<idno type="eISSN">1758-0463</idno>
<imprint><date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>PlantCAZyme is a database built upon dbCAN (database for automated carbohydrate active enzyme annotation), aiming to provide pre-computed sequence and annotation data of carbohydrate active enzymes (CAZymes) to plant carbohydrate and bioenergy research communities. The current version contains data of 43 790 CAZymes of 159 protein families from 35 plants (including angiosperms, gymnosperms, lycophyte and bryophyte mosses) and chlorophyte algae with fully sequenced genomes. Useful features of the database include: (i) a BLAST server and a HMMER server that allow users to search against our pre-computed sequence data for annotation purpose, (ii) a download page to allow batch downloading data of a specific CAZyme family or species and (iii) protein browse pages to provide an easy access to the most comprehensive sequence and annotation data.</p>
<p><bold>Database URL</bold>
: <ext-link ext-link-type="uri" xlink:href="http://cys.bios.niu.edu/plantcazyme/">http://cys.bios.niu.edu/plantcazyme/</ext-link>
</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Rubin, E M" uniqKey="Rubin E">E.M. Rubin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Himmel, M E" uniqKey="Himmel M">M.E. Himmel</name>
</author>
<author><name sortKey="Ding, S Y" uniqKey="Ding S">S.Y. Ding</name>
</author>
<author><name sortKey="Johnson, D K" uniqKey="Johnson D">D.K. Johnson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yin, Y" uniqKey="Yin Y">Y. Yin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cantarel, B L" uniqKey="Cantarel B">B.L. Cantarel</name>
</author>
<author><name sortKey="Coutinho, P M" uniqKey="Coutinho P">P.M. Coutinho</name>
</author>
<author><name sortKey="Rancurel, C" uniqKey="Rancurel C">C. Rancurel</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Coutinho, P M" uniqKey="Coutinho P">P.M. Coutinho</name>
</author>
<author><name sortKey="Stam, M" uniqKey="Stam M">M. Stam</name>
</author>
<author><name sortKey="Blanc, E" uniqKey="Blanc E">E. Blanc</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lombard, V" uniqKey="Lombard V">V. Lombard</name>
</author>
<author><name sortKey="Golaconda Ramulu, H" uniqKey="Golaconda Ramulu H">H. Golaconda Ramulu</name>
</author>
<author><name sortKey="Drula, E" uniqKey="Drula E">E. Drula</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Coutinho, P M" uniqKey="Coutinho P">P.M. Coutinho</name>
</author>
<author><name sortKey="Henrissat, B" uniqKey="Henrissat B">B. Henrissat</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yin, Y B" uniqKey="Yin Y">Y.B. Yin</name>
</author>
<author><name sortKey="Mao, X Z" uniqKey="Mao X">X.Z. Mao</name>
</author>
<author><name sortKey="Yang, J C" uniqKey="Yang J">J.C. Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mao, F L" uniqKey="Mao F">F.L. Mao</name>
</author>
<author><name sortKey="Yin, Y B" uniqKey="Yin Y">Y.B. Yin</name>
</author>
<author><name sortKey="Zhou, F F" uniqKey="Zhou F">F.F. Zhou</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cao, P J" uniqKey="Cao P">P.J. Cao</name>
</author>
<author><name sortKey="Bartley, L E" uniqKey="Bartley L">L.E. Bartley</name>
</author>
<author><name sortKey="Jung, K H" uniqKey="Jung K">K.H. Jung</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yong, W" uniqKey="Yong W">W. Yong</name>
</author>
<author><name sortKey="Link, B" uniqKey="Link B">B. Link</name>
</author>
<author><name sortKey="O Malley, R" uniqKey="O Malley R">R. O'Malley</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Girke, T" uniqKey="Girke T">T. Girke</name>
</author>
<author><name sortKey="Lauricha, J" uniqKey="Lauricha J">J. Lauricha</name>
</author>
<author><name sortKey="Tran, H" uniqKey="Tran H">H. Tran</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Goodstein, D M" uniqKey="Goodstein D">D.M. Goodstein</name>
</author>
<author><name sortKey="Shu, S" uniqKey="Shu S">S. Shu</name>
</author>
<author><name sortKey="Howson, R" uniqKey="Howson R">R. Howson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nystedt, B" uniqKey="Nystedt B">B. Nystedt</name>
</author>
<author><name sortKey="Street, N R" uniqKey="Street N">N.R. Street</name>
</author>
<author><name sortKey="Wetterbom, A" uniqKey="Wetterbom A">A. Wetterbom</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Eddy, S R" uniqKey="Eddy S">S.R. Eddy</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Henrissat, B" uniqKey="Henrissat B">B. Henrissat</name>
</author>
<author><name sortKey="Coutinho, P M" uniqKey="Coutinho P">P.M. Coutinho</name>
</author>
<author><name sortKey="Davies, G J" uniqKey="Davies G">G.J. Davies</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Geisler Lee, J" uniqKey="Geisler Lee J">J. Geisler-Lee</name>
</author>
<author><name sortKey="Geisler, M" uniqKey="Geisler M">M. Geisler</name>
</author>
<author><name sortKey="Coutinho, P M" uniqKey="Coutinho P">P.M. Coutinho</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Park, B H" uniqKey="Park B">B.H. Park</name>
</author>
<author><name sortKey="Karpinets, T V" uniqKey="Karpinets T">T.V. Karpinets</name>
</author>
<author><name sortKey="Syed, M H" uniqKey="Syed M">M.H. Syed</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Finn, R D" uniqKey="Finn R">R.D. Finn</name>
</author>
<author><name sortKey="Bateman, A" uniqKey="Bateman A">A. Bateman</name>
</author>
<author><name sortKey="Clements, J" uniqKey="Clements J">J. Clements</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tatusov, R L" uniqKey="Tatusov R">R.L. Tatusov</name>
</author>
<author><name sortKey="Fedorova, N D" uniqKey="Fedorova N">N.D. Fedorova</name>
</author>
<author><name sortKey="Jackson, J D" uniqKey="Jackson J">J.D. Jackson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kanehisa, M" uniqKey="Kanehisa M">M. Kanehisa</name>
</author>
<author><name sortKey="Goto, S" uniqKey="Goto S">S. Goto</name>
</author>
<author><name sortKey="Sato, Y" uniqKey="Sato Y">Y. Sato</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gough, J" uniqKey="Gough J">J. Gough</name>
</author>
<author><name sortKey="Chothia, C" uniqKey="Chothia C">C. Chothia</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mi, H Y" uniqKey="Mi H">H.Y. Mi</name>
</author>
<author><name sortKey="Muruganujan, A" uniqKey="Muruganujan A">A. Muruganujan</name>
</author>
<author><name sortKey="Thomas, P D" uniqKey="Thomas P">P.D. Thomas</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ashburner, M" uniqKey="Ashburner M">M. Ashburner</name>
</author>
<author><name sortKey="Ball, C A" uniqKey="Ball C">C.A. Ball</name>
</author>
<author><name sortKey="Blake, J A" uniqKey="Blake J">J.A. Blake</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hunter, S" uniqKey="Hunter S">S. Hunter</name>
</author>
<author><name sortKey="Apweiler, R" uniqKey="Apweiler R">R. Apweiler</name>
</author>
<author><name sortKey="Attwood, T K" uniqKey="Attwood T">T.K. Attwood</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Marchler Bauer, A" uniqKey="Marchler Bauer A">A. Marchler-Bauer</name>
</author>
<author><name sortKey="Lu, S N" uniqKey="Lu S">S.N. Lu</name>
</author>
<author><name sortKey="Anderson, J B" uniqKey="Anderson J">J.B. Anderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ware, D" uniqKey="Ware D">D. Ware</name>
</author>
<author><name sortKey="Jaiswal, P" uniqKey="Jaiswal P">P. Jaiswal</name>
</author>
<author><name sortKey="Ni, J J" uniqKey="Ni J">J.J. Ni</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Van Bel, M" uniqKey="Van Bel M">M. Van Bel</name>
</author>
<author><name sortKey="Proost, S" uniqKey="Proost S">S. Proost</name>
</author>
<author><name sortKey="Wischnitzki, E" uniqKey="Wischnitzki E">E. Wischnitzki</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bairoch, A" uniqKey="Bairoch A">A. Bairoch</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Claudel Renard, C" uniqKey="Claudel Renard C">C. Claudel-Renard</name>
</author>
<author><name sortKey="Chevalet, C" uniqKey="Chevalet C">C. Chevalet</name>
</author>
<author><name sortKey="Faraut, T" uniqKey="Faraut T">T. Faraut</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yu, C G" uniqKey="Yu C">C.G. Yu</name>
</author>
<author><name sortKey="Zavaijevski, N" uniqKey="Zavaijevski N">N. Zavaijevski</name>
</author>
<author><name sortKey="Desai, V" uniqKey="Desai V">V. Desai</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tian, W D" uniqKey="Tian W">W.D. Tian</name>
</author>
<author><name sortKey="Arakaki, A K" uniqKey="Arakaki A">A.K. Arakaki</name>
</author>
<author><name sortKey="Skolnick, J" uniqKey="Skolnick J">J. Skolnick</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mueller, L A" uniqKey="Mueller L">L.A. Mueller</name>
</author>
<author><name sortKey="Zhang, P F" uniqKey="Zhang P">P.F. Zhang</name>
</author>
<author><name sortKey="Rhee, S Y" uniqKey="Rhee S">S.Y. Rhee</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Guo, A Y" uniqKey="Guo A">A.Y. Guo</name>
</author>
<author><name sortKey="Chen, X" uniqKey="Chen X">X. Chen</name>
</author>
<author><name sortKey="Gao, G" uniqKey="Gao G">G. Gao</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fawal, N" uniqKey="Fawal N">N. Fawal</name>
</author>
<author><name sortKey="Li, Q" uniqKey="Li Q">Q. Li</name>
</author>
<author><name sortKey="Savelli, B" uniqKey="Savelli B">B. Savelli</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Saier, M H" uniqKey="Saier M">M.H. Saier</name>
</author>
<author><name sortKey="Tran, C V" uniqKey="Tran C">C.V. Tran</name>
</author>
<author><name sortKey="Barabote, R D" uniqKey="Barabote R">R.D. Barabote</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rawlings, N D" uniqKey="Rawlings N">N.D. Rawlings</name>
</author>
<author><name sortKey="Barrett, A J" uniqKey="Barrett A">A.J. Barrett</name>
</author>
<author><name sortKey="Bateman, A" uniqKey="Bateman A">A. Bateman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sukharnikov, L O" uniqKey="Sukharnikov L">L.O. Sukharnikov</name>
</author>
<author><name sortKey="Cantwell, B J" uniqKey="Cantwell B">B.J. Cantwell</name>
</author>
<author><name sortKey="Podar, M" uniqKey="Podar M">M. Podar</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">Database (Oxford)</journal-id>
<journal-id journal-id-type="iso-abbrev">Database (Oxford)</journal-id>
<journal-id journal-id-type="publisher-id">database</journal-id>
<journal-id journal-id-type="hwp">databa</journal-id>
<journal-title-group><journal-title>Database: The Journal of Biological Databases and Curation</journal-title>
</journal-title-group>
<issn pub-type="epub">1758-0463</issn>
<publisher><publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">25125445</article-id>
<article-id pub-id-type="pmc">4132414</article-id>
<article-id pub-id-type="doi">10.1093/database/bau079</article-id>
<article-id pub-id-type="publisher-id">bau079</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Database Tool</subject>
</subj-group>
</article-categories>
<title-group><article-title>PlantCAZyme: a database for plant carbohydrate-active enzymes</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Ekstrom</surname>
<given-names>Alexander</given-names>
</name>
<xref ref-type="aff" rid="bau079-AFF1"><sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Taujale</surname>
<given-names>Rahil</given-names>
</name>
<xref ref-type="aff" rid="bau079-AFF1"><sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>McGinn</surname>
<given-names>Nathan</given-names>
</name>
<xref ref-type="aff" rid="bau079-AFF1"><sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Yin</surname>
<given-names>Yanbin</given-names>
</name>
<xref ref-type="aff" rid="bau079-AFF1"><sup>2</sup>
</xref>
<xref ref-type="corresp" rid="bau079-COR1">*</xref>
</contrib>
<aff id="bau079-AFF1"><sup>1</sup>
Department of Computer Science and<sup>2</sup>
Department of Biological Sciences, Northern Illinois University, DeKalb, IL 60115, USA</aff>
</contrib-group>
<author-notes><corresp id="bau079-COR1">*Corresponding author: Tel: <phone>+1 815 753 8963</phone>
; Fax: <fax>+1 815 753 7855</fax>
; E-mail: <email>yyin@niu.edu</email>
</corresp>
<fn><p>Citation details: Ekstrom,A., Taujale,R., McGinn,N. <italic>et al</italic>
. PlantCAZyme: a database for plant carbohydrate-active enzymes. <italic>Database</italic>
 (2014) Vol. 2014: article ID bau079; doi:10.1093/database/bau079</p>
</fn>
</author-notes>
<pub-date pub-type="collection"><year>2014</year>
</pub-date>
<pub-date pub-type="epub"><day>14</day>
<month>8</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="pmc-release"><day>14</day>
<month>8</month>
<year>2014</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the 
 							. </pmc-comment>
      <volume>2014</volume>
<elocation-id>bau079</elocation-id>
<history><date date-type="received"><day>22</day>
<month>2</month>
<year>2014</year>
</date>
<date date-type="rev-recd"><day>16</day>
<month>6</month>
<year>2014</year>
</date>
<date date-type="accepted"><day>16</day>
<month>6</month>
<year>2014</year>
</date>
</history>
<permissions><copyright-statement>© The Author(s) 2014. Published by Oxford University Press.</copyright-statement>
<copyright-year>2014</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by/4.0/"><license-p><pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract><p>PlantCAZyme is a database built upon dbCAN (database for automated carbohydrate active enzyme annotation), aiming to provide pre-computed sequence and annotation data of carbohydrate active enzymes (CAZymes) to plant carbohydrate and bioenergy research communities. The current version contains data of 43 790 CAZymes of 159 protein families from 35 plants (including angiosperms, gymnosperms, lycophyte and bryophyte mosses) and chlorophyte algae with fully sequenced genomes. Useful features of the database include: (i) a BLAST server and a HMMER server that allow users to search against our pre-computed sequence data for annotation purpose, (ii) a download page to allow batch downloading data of a specific CAZyme family or species and (iii) protein browse pages to provide an easy access to the most comprehensive sequence and annotation data.</p>
<p><bold>Database URL</bold>
: <ext-link ext-link-type="uri" xlink:href="http://cys.bios.niu.edu/plantcazyme/">http://cys.bios.niu.edu/plantcazyme/</ext-link>
</p>
</abstract>
<counts><page-count count="8"></page-count>
</counts>
</article-meta>
</front>
<body><sec sec-type="intro"><title>Introduction</title>
<p>Lignocellulosic biofuels have received great attentions in the past decade for obvious economic and environmental reasons [<xref rid="bau079-B1" ref-type="bibr">1</xref>
]. Other than using starch-based plant materials as the feedstock, lignocellulosic biofuels use inedible plant biomass materials, which however are very recalcitrant to be degraded to release fermentable sugars. The bioenergy research community thus has major interests in genetically modifying plants in order to develop low-cost biofuels [<xref rid="bau079-B2" ref-type="bibr">2</xref>
]. To achieve this goal, researchers need to know which genes should be modified to acquire the desired plants with lower recalcitrance to enzymatic degradation. Therefore biomass-related enzyme databases are highly needed to promote the development of transgenic biofuel crops [<xref rid="bau079-B3" ref-type="bibr">3</xref>
]. Carbohydrate-Active enzymes (CAZymes) are enzymes responsible for the synthesis, degradation and modification of storage and structural biomass polysaccharides [<xref rid="bau079-B4" ref-type="bibr">4</xref>
] and thus are the most important enzymes for bioenergy research. CAZymes are not only found in plants and bacteria, but also in fungi and animals, responsible for the synthesis, degradation and modification of all the glycoconjugates in nature including glycoproteins and glycolipids. Therefore they are also fundamentally important for general carbohydrate and glycobiology research [<xref rid="bau079-B4" ref-type="bibr">4</xref>
].</p>
<p>CAZymes are present in all life kingdoms and particularly abundant in plants [<xref rid="bau079-B5" ref-type="bibr">5</xref>
]. Since 1998, the CAZyme database, known as CAZy, has started to collect experimentally (biochemically, genetically and structurally) characterized CAZyme proteins and classify them into protein families and so far has created 330 families (as of May 2013) of six classes based on sequence homology: GHs (glycoside hydrolases), GTs (glycosyltransferases), CEs (carbohydrate esterases), PLs (polysaccharide lyases), AAs (auxiliary activities) and CBMs (carbohydrate binding modules) [<xref rid="bau079-B6" ref-type="bibr">6</xref>
]. It then populated each family by including homologs from GenBank, UniProt and PDB databases using both BLAST and protein domain/motif search strategies as well as expert manual inspection of sequence alignment [<xref rid="bau079-B4" ref-type="bibr">4</xref>
, <xref rid="bau079-B7" ref-type="bibr">7</xref>
]. CAZy is an extremely useful resource for its most original classification scheme and high-quality manual curation, and thus has been widely accepted by the carbohydrate research community.</p>
<p>A great demand of an automated CAZyme annotation emerged in the past few years due to the production of thousands of completed plant and microbial genomes and metagenomes. However CAZy database does not provide automated CAZyme annotation. In view of this need, in 2012 we have developed a web server named dbCAN, to allow users to submit the newly sequenced genomes for an automated CAZyme annotation [<xref rid="bau079-B8" ref-type="bibr">8</xref>
]. Behind the web server are hidden Markov models (HMMs) of the 330 CAZyme families; each HMM represents the sequence alignment of conserved signature domains of each family, which were retrieved from annotated CAZyme protein sequences of the CAZy database. dbCAN website has received over thousands of visits from many countries after publication, demonstrating its impact on the research of CAZymes.</p>
<p>The availability of the 330 CAZyme HMMs has also made it possible to build a dedicated database for plant CAZymes. With regard to similar resources, the CAZy database covers only two (<italic>Arabidopsis</italic>
<italic> thaliana</italic>
 and <italic>Oryza</italic>
<italic>sativa</italic>
) out of over 40 sequenced plant and algal genomes; all sequenced bioenergy crops (e.g. poplar, switchgrass, soghum) and evolutionarily important organisms (e.g. moss, spike moss, algae) were not included. Two other databases, pDAWG [<xref rid="bau079-B9" ref-type="bibr">9</xref>
] and Rice GT [<xref rid="bau079-B10" ref-type="bibr">10</xref>
], are limited to a small number of CAZyme families and genomes. There are also a few other databases such as the Cell Wall Genomics database [<xref rid="bau079-B11" ref-type="bibr">11</xref>
] and the Cell Wall Navigator database [<xref rid="bau079-B12" ref-type="bibr">12</xref>
], which only contain a very small number of CAZyme families. Therefore, the development of PlantCAZyme is a timely and highly significant addition to the toolbox of plant carbohydrate and bioenergy research.</p>
</sec>
<sec><title>Construction and Content</title>
<sec><title>Collection of CAZyme sequences</title>
<p>Over 40 plant and algal genomes are completed and most of them are available in the Phytozome database [<xref rid="bau079-B13" ref-type="bibr">13</xref>
]. To collect the plant CAZyme protein sequences, we used 330 dbCAN HMMs as query and scanned 35 genomes (<xref ref-type="table" rid="bau079-T1">Table 1</xref>
), including 34 Phytozome genomes of 23 dicots, six monocots, one moss, one spike moss, two chlorophyte algae, as well as one gymnosperm genome [<xref rid="bau079-B14" ref-type="bibr">14</xref>
] that is not available in Phytozome, using the HMMER 3.0 package as the homology search tool [<xref rid="bau079-B15" ref-type="bibr">15</xref>
] with default parameters (<italic>E</italic>-value < 10 and output in parseable table of per-domain hits). The HMMER output was further processed to keep the significant hits as described in below.
<table-wrap id="bau079-T1" position="float"><label>Table 1.</label>
<caption><p>Thirty-five plant and algal genomes that are included in the PlantCAZyme database</p>
</caption>
<table frame="hsides" rules="groups"><thead align="left"><tr><th rowspan="1" colspan="1"><italic>Species</italic>
</th>
<th rowspan="1" colspan="1">Clade</th>
<th rowspan="1" colspan="1">Source</th>
<th rowspan="1" colspan="1"># of genes</th>
<th rowspan="1" colspan="1"># of CAZyme genes</th>
<th rowspan="1" colspan="1">% of CAZyme genes</th>
</tr>
</thead>
<tbody align="left"><tr><td rowspan="1" colspan="1"><italic>Volvox carteri</italic>
</td>
<td rowspan="1" colspan="1">Chlorophyte</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">14 971</td>
<td rowspan="1" colspan="1">198</td>
<td rowspan="1" colspan="1">1.32</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Chlamydomonas reinhardtii</italic>
</td>
<td rowspan="1" colspan="1">Chlorophyte</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">20 497</td>
<td rowspan="1" colspan="1">285</td>
<td rowspan="1" colspan="1">1.39</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Physcomitrella patens</italic>
</td>
<td rowspan="1" colspan="1">Bryophyta</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">21 173</td>
<td rowspan="1" colspan="1">857</td>
<td rowspan="1" colspan="1">4.05</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Selaginella moellendorffii</italic>
</td>
<td rowspan="1" colspan="1">Lycophyta</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">22 285</td>
<td rowspan="1" colspan="1">919</td>
<td rowspan="1" colspan="1">4.12</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Picea abies</italic>
</td>
<td rowspan="1" colspan="1">Gymnosperm</td>
<td rowspan="1" colspan="1">Congenie</td>
<td rowspan="1" colspan="1">71 158</td>
<td rowspan="1" colspan="1">1843</td>
<td rowspan="1" colspan="1">2.59</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Aquilegia coerulea</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">24 823</td>
<td rowspan="1" colspan="1">1099</td>
<td rowspan="1" colspan="1">4.43</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Arabidopsis lyrata</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">32 670</td>
<td rowspan="1" colspan="1">1232</td>
<td rowspan="1" colspan="1">3.77</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Arabidopsis thaliana</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">27 416</td>
<td rowspan="1" colspan="1">1224</td>
<td rowspan="1" colspan="1">4.46</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Brassica rapa</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">40 905</td>
<td rowspan="1" colspan="1">1812</td>
<td rowspan="1" colspan="1">4.43</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Capsella rubella</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">26 521</td>
<td rowspan="1" colspan="1">1211</td>
<td rowspan="1" colspan="1">4.57</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Carica papaya</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">27 769</td>
<td rowspan="1" colspan="1">845</td>
<td rowspan="1" colspan="1">3.04</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Citrus clementina</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">24 553</td>
<td rowspan="1" colspan="1">1098</td>
<td rowspan="1" colspan="1">4.47</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Citrus sinensis</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">25 379</td>
<td rowspan="1" colspan="1">1083</td>
<td rowspan="1" colspan="1">4.27</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Cucumis sativus</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">21 503</td>
<td rowspan="1" colspan="1">1008</td>
<td rowspan="1" colspan="1">4.69</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Eucalyptus grandis</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">36 376</td>
<td rowspan="1" colspan="1">1711</td>
<td rowspan="1" colspan="1">4.70</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Fragaria vesca</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">65 662</td>
<td rowspan="1" colspan="1">1105</td>
<td rowspan="1" colspan="1">1.68</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Glycine max</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">54 175</td>
<td rowspan="1" colspan="1">2354</td>
<td rowspan="1" colspan="1">4.35</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Gossypium raimondii</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">37 505</td>
<td rowspan="1" colspan="1">1648</td>
<td rowspan="1" colspan="1">4.39</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Linum usitatissimum</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">43 471</td>
<td rowspan="1" colspan="1">2018</td>
<td rowspan="1" colspan="1">4.64</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Malus domestica</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">63 514</td>
<td rowspan="1" colspan="1">2220</td>
<td rowspan="1" colspan="1">3.50</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Manihot esculenta</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">30 666</td>
<td rowspan="1" colspan="1">1442</td>
<td rowspan="1" colspan="1">4.70</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Medicago truncatula</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">44 135</td>
<td rowspan="1" colspan="1">1173</td>
<td rowspan="1" colspan="1">2.66</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Mimulus guttatus</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">26 718</td>
<td rowspan="1" colspan="1">1271</td>
<td rowspan="1" colspan="1">4.76</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Phaseolus vulgaris</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">27 197</td>
<td rowspan="1" colspan="1">1351</td>
<td rowspan="1" colspan="1">4.97</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Populus trichocarpa</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">41 335</td>
<td rowspan="1" colspan="1">1751</td>
<td rowspan="1" colspan="1">4.24</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Prunus persica</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">27 864</td>
<td rowspan="1" colspan="1">1288</td>
<td rowspan="1" colspan="1">4.62</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Ricinus communis</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">31 221</td>
<td rowspan="1" colspan="1">1135</td>
<td rowspan="1" colspan="1">3.64</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Thellungiella halophila</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">26 351</td>
<td rowspan="1" colspan="1">1132</td>
<td rowspan="1" colspan="1">4.30</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Vitis vinifera</italic>
</td>
<td rowspan="1" colspan="1">Dicot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">26 346</td>
<td rowspan="1" colspan="1">1096</td>
<td rowspan="1" colspan="1">4.16</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Brachypodium distachyon</italic>
</td>
<td rowspan="1" colspan="1">Monocot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">26 552</td>
<td rowspan="1" colspan="1">1243</td>
<td rowspan="1" colspan="1">4.68</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Oryza sativa</italic>
</td>
<td rowspan="1" colspan="1">Monocot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">39 234</td>
<td rowspan="1" colspan="1">1363</td>
<td rowspan="1" colspan="1">3.47</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Panicum virgatum</italic>
</td>
<td rowspan="1" colspan="1">Monocot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">65 878</td>
<td rowspan="1" colspan="1">2624</td>
<td rowspan="1" colspan="1">3.98</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Setaria italica</italic>
</td>
<td rowspan="1" colspan="1">Monocot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">35 471</td>
<td rowspan="1" colspan="1">1487</td>
<td rowspan="1" colspan="1">4.19</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Sorghum bicolor</italic>
</td>
<td rowspan="1" colspan="1">Monocot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">27 608</td>
<td rowspan="1" colspan="1">1334</td>
<td rowspan="1" colspan="1">4.83</td>
</tr>
<tr><td rowspan="1" colspan="1"><italic>Zea mays</italic>
</td>
<td rowspan="1" colspan="1">Monocot</td>
<td rowspan="1" colspan="1">Phytozome</td>
<td rowspan="1" colspan="1">39 656</td>
<td rowspan="1" colspan="1">1475</td>
<td rowspan="1" colspan="1">3.72</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec><title>Selection of golden standard datasets for accuracy benchmark</title>
<p>Since the CAZymes of <italic>Arabidopsis</italic>
 and rice have been annotated in the CAZy database, we have used these two genomes to calculate the sensitivity (or recall) and positive predictive value (or precision) of our CAZyme data. It is worth mentioning that the ‘annotated’ CAZymes of CAZy include not only experimentally characterized proteins, but also proteins that are deemed to be true homologs of the characterized proteins. For example, there are only three <italic>Arabidopsis</italic>
 proteins experimentally characterized to be GH17 enzymes (<ext-link ext-link-type="uri" xlink:href="http://www.cazy.org/GH17_characterized.html">http://www.cazy.org/GH17_characterized.html</ext-link>
); however 51 <italic>Arabidopsis</italic>
 proteins are listed as GH17 enzymes (<ext-link ext-link-type="uri" xlink:href="http://www.cazy.org/GH17_eukaryota.html">http://www.cazy.org/GH17_eukaryota.html</ext-link>
). The reason is that CAZy database annotates CAZymes from the GenBank database, including those from <italic>Arabidopsis</italic>
 and rice, by combining homology search and expert curation (e.g. manual inspection of sequence alignment for characteristic amino acid motifs [<xref rid="bau079-B7" ref-type="bibr">7</xref>
]). Most of the <italic>Arabidopsis</italic>
 CAZymes including those experimentally uncharacterized have been manually curated by CAZy developers and published in 2001 [<xref rid="bau079-B16" ref-type="bibr">16</xref>
]. The similar approach has also been applied to the annotation of poplar CAZymes in 2006 [<xref rid="bau079-B17" ref-type="bibr">17</xref>
]. Due to its high-quality manual curation and rich functional annotation, CAZy was used as a golden standard dataset to assess automated CAZyme annotation by the CAZymes Analysis Toolkit (CAT) [<xref rid="bau079-B18" ref-type="bibr">18</xref>
] and the dbCAN database [<xref rid="bau079-B8" ref-type="bibr">8</xref>
].</p>
<p>There are also other protein family and function classification databases such as Pfam [<xref rid="bau079-B19" ref-type="bibr">19</xref>
], KOG (eukaryotic orthologous groups) [<xref rid="bau079-B20" ref-type="bibr">20</xref>
], KEGG Orthology (KO) [<xref rid="bau079-B21" ref-type="bibr">21</xref>
], SUPERFAMILY [<xref rid="bau079-B22" ref-type="bibr">22</xref>
], PANTHER [<xref rid="bau079-B23" ref-type="bibr">23</xref>
], Gene Ontology (GO) [<xref rid="bau079-B24" ref-type="bibr">24</xref>
] and many others. Each database has its own strength and focus (e.g. on protein domain or evolution or pathway or structure) and has much redundancy among each other (i.e. one protein family is described in multiple databases). Therefore integration efforts such as InterPro database [<xref rid="bau079-B25" ref-type="bibr">25</xref>
] and CDD database [<xref rid="bau079-B26" ref-type="bibr">26</xref>
] attempted to integrate all these different protein family databases into one framework to remove redundancy. Many of these resources are extremely useful for genome annotation purpose. For example, in the plant genomics community Phytozome [<xref rid="bau079-B13" ref-type="bibr">13</xref>
], Gramene [<xref rid="bau079-B27" ref-type="bibr">27</xref>
] and PLAZA [<xref rid="bau079-B28" ref-type="bibr">28</xref>
] used the above resources to construct and compare protein families across different plants. In addition, ENZYME database [<xref rid="bau079-B29" ref-type="bibr">29</xref>
] created the nomenclature system (i.e. the Enzyme Commission/EC numbers) of all characterized enzymes and associated biochemical reactions. Other databases such as Priam [<xref rid="bau079-B30" ref-type="bibr">30</xref>
], CatFam [<xref rid="bau079-B31" ref-type="bibr">31</xref>
], EFICAz [<xref rid="bau079-B32" ref-type="bibr">32</xref>
] and PlantCyc [<xref rid="bau079-B33" ref-type="bibr">33</xref>
] employed the EC classification system to either define enzyme family models or reconstruct metabolic pathways.</p>
<p>However, unlike CAZy, dbCAN and PlantCAZyme, all the above resources are not specifically designed for CAZymes but rather are general protein family/classification databases. As their mission is to cover all protein families in nature as broadly as possible, they do not have a focus and often miss some families of certain protein class, which is one of the reasons for the need of many specialized databases for individual protein families/classes such as [<xref rid="bau079-B6" ref-type="bibr">6</xref>
, <xref rid="bau079-B34" ref-type="bibr">34–37</xref>
] (see more at <ext-link ext-link-type="uri" xlink:href="http://www.oxfordjournals.org/nar/database/subcat/3/10">http://www.oxfordjournals.org/nar/database/subcat/3/10</ext-link>
). For example, Pfam only covers 142 out of 330 CAZyme families [<xref rid="bau079-B8" ref-type="bibr">8</xref>
]. As a matter of fact, most of these 142 families were initially defined and annotated (from literature curation) by CAZy database and then were included into Pfam as HMMs, which makes Pfam not an ideal resource for CAZyme annotation. In addition, it is well known that one single CAZyme family could contain proteins with different biochemical activities and one biochemical activity could be carried by multiple CAZyme families [<xref rid="bau079-B4" ref-type="bibr">4</xref>
]. For example, the CAZyme GH5 family contains characterized proteins with 20 different EC numbers (manually curated at <ext-link ext-link-type="uri" xlink:href="http://www.cazy.org/GH5.html">http://www.cazy.org/GH5.html</ext-link>
) and the cellulase (EC 3.2.1.4) activity is found in more than 10 GH families [<xref rid="bau079-B38" ref-type="bibr">38</xref>
]. This makes it impossible to compare dbCAN HMM-based search and EC-based databases (e.g. Priam and CatFam) in terms of CAZyme assignment. Therefore, one cannot evaluate the CAZyme family assignment by comparing to the general protein family/classification databases. Since we aim to assess if we have retrieved all CAZyme homologs using the HMMs built from CAZy annotated proteins, CAZy database is naturally selected as the gold standard dataset to evaluate our performance.</p>
</sec>
<sec><title>Accuracy benchmark with <italic>Arabidopsis</italic>
 and rice data</title>
<p>As discussed in our dbCAN article [<xref rid="bau079-B8" ref-type="bibr">8</xref>
], two criteria significantly impact the sensitivity and precision of our automated CAZyme annotation. One is <italic>E</italic>
-value and the other is coverage, which is defined to measure the fraction of CAZyme domains covered in the alignment. We have tested the performance of dbCAN-based search on all of the CAZyme families as a whole (denoted as <italic>All</italic>
) using different combinations of <italic>E</italic>
-values and overage cutoffs. <xref ref-type="fig" rid="bau079-F1">Figure 1</xref>
 shows the F-measure values of different parameter combinations for the <italic>All</italic>
 sets of <italic>Arabidopsis</italic>
 (<xref ref-type="fig" rid="bau079-F1">Figure 1</xref>
A) and rice (<xref ref-type="fig" rid="bau079-F1">Figure 1</xref>
B), where <italic>F</italic>
-measure = 2 × (Sensitivity × Precision) / (Sensitivity + Precision). We then selected the combination that gave the highest F-measure value and presented them in <xref ref-type="table" rid="bau079-T2">Tables 2</xref>
 and <xref ref-type="table" rid="bau079-T3">3</xref>
. The more detailed information about how to calculate Sensitivity and Precision is provided in the <ext-link ext-link-type="uri" xlink:href="http://database.oxfordjournals.org/lookup/suppl/doi:10.1093/database/bau079/-/DC1">Supplementary Tables S1–S12</ext-link>.
<fig id="bau079-F1" position="float"><label>Figure 1.</label>
<caption><p>Evaluation of the impact of <italic>E</italic>
-value and coverage parameters to the accuracy of pre-computed PlantCAZyme sequence data for <italic>Arabidopsis</italic>
 and rice; <italic>x</italic>
-axis (horizontal): <italic>E-</italic>
value, <italic>y</italic>
-axis (vertical): F-measure, <italic>Z</italic>
-axis: coverage. For both species, <italic>E</italic>
-value < 1e–23 and coverage > 0.2 gave the highest F-measure. The detailed calculations are provided in <ext-link ext-link-type="uri" xlink:href="http://database.oxfordjournals.org/lookup/suppl/doi:10.1093/database/bau079/-/DC1">Supplementary Table S1</ext-link>
 and <ext-link ext-link-type="uri" xlink:href="http://database.oxfordjournals.org/lookup/suppl/doi:10.1093/database/bau079/-/DC1">S2</ext-link>
.</p>
</caption>
<graphic xlink:href="bau079f1p"></graphic>
</fig>
</p>
<p><xref ref-type="table" rid="bau079-T2">Tables 2</xref>
 and <xref ref-type="table" rid="bau079-T3">3</xref>
 show that the coverage >0.2 and <italic>E</italic>
-value < 1<italic>e</italic>
-23 combination gave the best F-measure for both <italic>Arabidopsis</italic>
 (F-measure = 0.91, sensitivity = 0.89 and precision = 0.92) and rice (F-measure = 0.85, sensitivity = 0.84 and precision = 0.85). We have also performed evaluation for the five CAZyme classes separately, which suggests that the best F-measure varies for different CAZyme classes (<xref ref-type="table" rid="bau079-T2">Tables 2</xref>
 and <xref ref-type="table" rid="bau079-T3">3</xref>
). Overall the largest two classes GT and GH (81% of CAZyme families) in both plants have higher <italic>F</italic>
-measures than the three smaller classes CE, PL and CBM. It also suggests that: (i) to annotate GH proteins, one should use a very relax coverage cutoff or the sensitivity will be low (<ext-link ext-link-type="uri" xlink:href="http://database.oxfordjournals.org/lookup/suppl/doi:10.1093/database/bau079/-/DC1">Supplementary Tables S4</ext-link>
 and <ext-link ext-link-type="uri" xlink:href="http://database.oxfordjournals.org/lookup/suppl/doi:10.1093/database/bau079/-/DC1">S9</ext-link>
); (ii) to annotate CE families a very stringent <italic>E</italic>
-value cutoff and coverage cutoff should be used; otherwise the precision will be very low due to a very high false positive rate (<ext-link ext-link-type="uri" xlink:href="http://database.oxfordjournals.org/lookup/suppl/doi:10.1093/database/bau079/-/DC1">Supplementary Tables S5</ext-link>
 and <ext-link ext-link-type="uri" xlink:href="http://database.oxfordjournals.org/lookup/suppl/doi:10.1093/database/bau079/-/DC1">S10</ext-link>
). Although it would work best to use different parameter combinations for different CAZyme classes and for different plants, we decided to use coverage > 0.2 and <italic>E-</italic>
value < 1<bold>e</bold>-23 as the universal threshold, as this setting agrees in both dicots and monocots and makes the parsing process less complicated and easy to reproduce by others.
<table-wrap id="bau079-T2" position="float"><label>Table 2.</label>
<caption><p>The <italic>E</italic>
-value and Coverage cutoffs that lead to the best <italic>F</italic>
-measure in <italic>Arabidopsis</italic>
</p>
</caption>
<table frame="hsides" rules="groups"><thead align="left"><tr><th rowspan="1" colspan="1">Arabidopsis</th>
<th rowspan="1" colspan="1"># of CAZyme families</th>
<th rowspan="1" colspan="1"><italic>E</italic>
-value</th>
<th rowspan="1" colspan="1">Coverage</th>
<th rowspan="1" colspan="1"><italic>F</italic>
-measure</th>
<th rowspan="1" colspan="1">Sensitivity</th>
<th rowspan="1" colspan="1">Precision</th>
</tr>
</thead>
<tbody align="left"><tr><td rowspan="1" colspan="1"><bold>All</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">98</td>
<td align="char" char="." rowspan="1" colspan="1">1.00<italic>E</italic>
-23</td>
<td align="char" char="." rowspan="1" colspan="1">0.2</td>
<td align="char" char="." rowspan="1" colspan="1">0.909236762</td>
<td align="char" char="." rowspan="1" colspan="1">0.894071914</td>
<td align="char" char="." rowspan="1" colspan="1">0.924924925</td>
</tr>
<tr><td rowspan="1" colspan="1"><bold>GT</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">43</td>
<td align="char" char="." rowspan="1" colspan="1">1.00E<italic>-</italic>
11</td>
<td align="char" char="." rowspan="1" colspan="1">0.25</td>
<td align="char" char="." rowspan="1" colspan="1">0.937634409</td>
<td align="char" char="." rowspan="1" colspan="1">0.947826087</td>
<td align="char" char="." rowspan="1" colspan="1">0.927659574</td>
</tr>
<tr><td rowspan="1" colspan="1"><bold>GH</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">36</td>
<td align="char" char="." rowspan="1" colspan="1">1.00<italic>E</italic>
-16</td>
<td align="char" char="." rowspan="1" colspan="1">0.05</td>
<td align="char" char="." rowspan="1" colspan="1">0.974811083</td>
<td align="char" char="." rowspan="1" colspan="1">0.969924812</td>
<td align="char" char="." rowspan="1" colspan="1">0.979746835</td>
</tr>
<tr><td rowspan="1" colspan="1"><bold>CE</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">5</td>
<td align="char" char="." rowspan="1" colspan="1">1.00<italic>E</italic>
-29</td>
<td align="char" char="." rowspan="1" colspan="1">0.95</td>
<td align="char" char="." rowspan="1" colspan="1">0.945741134</td>
<td align="char" char="." rowspan="1" colspan="1">0.917647059</td>
<td align="char" char="." rowspan="1" colspan="1">0.975609756</td>
</tr>
<tr><td rowspan="1" colspan="1"><bold>PL</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">2</td>
<td align="char" char="." rowspan="1" colspan="1">1.00<italic>E</italic>
-30</td>
<td align="char" char="." rowspan="1" colspan="1">0.25</td>
<td align="char" char="." rowspan="1" colspan="1">0.970588235</td>
<td align="char" char="." rowspan="1" colspan="1">0.970588235</td>
<td align="char" char="." rowspan="1" colspan="1">0.970588235</td>
</tr>
<tr><td rowspan="1" colspan="1"><bold>CBM</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">10</td>
<td align="char" char="." rowspan="1" colspan="1">1.00<italic>E</italic>
-12</td>
<td align="char" char="." rowspan="1" colspan="1">0.75</td>
<td align="char" char="." rowspan="1" colspan="1">0.79613773</td>
<td align="char" char="." rowspan="1" colspan="1">0.821428571</td>
<td align="char" char="." rowspan="1" colspan="1">0.772357724</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="bau079-T3" position="float"><label>Table 3<bold>.</bold>
</label>
<caption><p>The <italic>E</italic>
-value and coverage cutoffs that lead to the best <italic>F</italic>
-measure in Rice</p>
</caption>
<table frame="hsides" rules="groups"><thead align="left"><tr><th rowspan="1" colspan="1">Rice</th>
<th rowspan="1" colspan="1"># of CAZyme families</th>
<th rowspan="1" colspan="1"><italic>E</italic>
-value</th>
<th rowspan="1" colspan="1">Coverage</th>
<th rowspan="1" colspan="1"><italic>F</italic>
-measure</th>
<th rowspan="1" colspan="1">Sensitivity</th>
<th rowspan="1" colspan="1">Precision</th>
</tr>
</thead>
<tbody align="left"><tr><td rowspan="1" colspan="1"><bold>All</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">97</td>
<td align="char" char="." rowspan="1" colspan="1">1.00<italic>E</italic>
-23</td>
<td align="char" char="." rowspan="1" colspan="1">0.2</td>
<td align="char" char="." rowspan="1" colspan="1">0.845169681</td>
<td align="char" char="." rowspan="1" colspan="1">0.840619308</td>
<td align="char" char="." rowspan="1" colspan="1">0.849769585</td>
</tr>
<tr><td rowspan="1" colspan="1"><bold>GT</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">44</td>
<td align="char" char="." rowspan="1" colspan="1">1.00<italic>E</italic>
-10</td>
<td align="char" char="." rowspan="1" colspan="1">0.35</td>
<td align="char" char="." rowspan="1" colspan="1">0.906381793</td>
<td align="char" char="." rowspan="1" colspan="1">0.908931699</td>
<td align="char" char="." rowspan="1" colspan="1">0.903846154</td>
</tr>
<tr><td rowspan="1" colspan="1"><bold>GH</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">35</td>
<td align="char" char="." rowspan="1" colspan="1">1.00<italic>E</italic>
-13</td>
<td align="char" char="." rowspan="1" colspan="1">0.1</td>
<td align="char" char="." rowspan="1" colspan="1">0.92415331</td>
<td align="char" char="." rowspan="1" colspan="1">0.91745283</td>
<td align="char" char="." rowspan="1" colspan="1">0.930952381</td>
</tr>
<tr><td rowspan="1" colspan="1"><bold>CE</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">5</td>
<td align="char" char="." rowspan="1" colspan="1">1.00<italic>E</italic>
-28</td>
<td align="char" char="." rowspan="1" colspan="1">0.95</td>
<td align="char" char="." rowspan="1" colspan="1">0.913545252</td>
<td align="char" char="." rowspan="1" colspan="1">0.905660377</td>
<td align="char" char="." rowspan="1" colspan="1">0.921568627</td>
</tr>
<tr><td rowspan="1" colspan="1"><bold>PL</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">2</td>
<td align="char" char="." rowspan="1" colspan="1">1.00<italic>E</italic>
-30</td>
<td align="char" char="." rowspan="1" colspan="1">0.7</td>
<td align="char" char="." rowspan="1" colspan="1">0.827586207</td>
<td align="char" char="." rowspan="1" colspan="1">0.75</td>
<td align="char" char="." rowspan="1" colspan="1">0.923076923</td>
</tr>
<tr><td rowspan="1" colspan="1"><bold>CBM</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">9</td>
<td align="char" char="." rowspan="1" colspan="1">1.00<italic>E</italic>
-16</td>
<td align="char" char="." rowspan="1" colspan="1">0.45</td>
<td align="char" char="." rowspan="1" colspan="1">0.716031632</td>
<td align="char" char="." rowspan="1" colspan="1">0.857142857</td>
<td align="char" char="." rowspan="1" colspan="1">0.614814815</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec><title>Annotation data</title>
<p>We have further generated extensive bioinformatics annotation data for the plant CAZyme sequences by running various bioinformatics tools against different databases. As shown in <xref ref-type="fig" rid="bau079-F2">Figure 2</xref>, these data include functional annotation (conserved functional domains, Gene Ontology annotation, top matches in the non-redundant protein database [NCBI-nr] and expressed sequence tag (EST) database), structural annotation [top matches in the Protein Data Bank (PDB), predicted transmembrane domains, signal peptides, coiled regions, hydropathy plot], phylogenetic annotation (orthologous groups of the CAZyme domains, multiple sequence alignment, phylogenetic tree) and miscellaneous data (nucleotide coding sequences, CAZyme signature domain sequences, genomic location, external links, publications, etc.).
<fig id="bau079-F2" position="float"><label>Figure 2.</label>
<caption><p>A schematic architecture of the PlantCAZyme database</p>
</caption>
<graphic xlink:href="bau079f2p"></graphic>
</fig>
</p>
</sec>
</sec>
<sec><title>Utility and Discussion</title>
<sec><title>Implementation and user interface</title>
<p>All the data were integrated and presented through a web interface powered by MySQL+PHP+JavaScript. As shown in <xref ref-type="fig" rid="bau079-F2">Figure 2</xref>
, the <italic>protein centric display page</italic>
 is used to present the sequence and annotation of each CAZyme protein. The website has a <italic>download page</italic>
 that allows users to download CAZyme sequences of a particular species or a particular CAZyme family. Both the CAZyme signature domain sequences and the full-length sequences are available for any species or any family.</p>
<p>A <italic>BLAST page</italic>
 and a HMMER (annotate) page were included to allow users to submit their own sequences for annotation, which are very useful to annotate sequences that are not included in our database. For BLAST search, users can submit both protein and nucleotide sequences and the databases for BLAST search can be chosen from: (i) the CAZy database that contains full-length GenBank protein sequences annotated in the CAZy database, (ii) the plant CAZyme domain sequences (not the full length) that are compiled in our PlantCAZyme database containing the CAZyme signature domains identified by dbCAN search. The results are returned as a webpage with a tabular output of the BLAST program.</p>
<p>For <italic>HMMER page</italic>
, users must submit protein sequences as query and the database is the dbCAN’s HMMs. Since HMMs are built for each CAZyme family to represent the signature domain, this type of search is a better way than BLAST search to annotate new protein sequences with the modular CAZyme domain architecture.</p>
<p>In addition to sequence search, the <italic>keyword search</italic>
 function was also implemented. The top-right corner of each webpage has a search box, where users can search the database with a keyword. There are two options for keyword search: unformatted searching and formatted searching. For unformatted searching you enter a query with no formatting. This will run the query only against the following fields: (i) ID, e.g. AT2G46570.1, (ii) Family, e.g. CBM10, (iii) Species, e.g. <italic>A.</italic>
<italic>thaliana</italic>
 and (iv) Domain, e.g. Cellulose_synt. Formatted searching allows users to be more specific and search through more fields. Formatted searches are done by indicating formatting with the use of brackets []. For example, if users want to search for the species <italic>A.</italic>
<italic>thaliana</italic>
, they can search ‘<italic>Arabidopsis thaliana</italic>
[Species]’, which will bring up anything with a species containing ‘Arabidopsis’ or ‘thaliana’. Users can write more than one specifier in a query. So if users only wanted the AA1 family, they could write the query as ‘Arabidopsis[Species] thaliana[Species] AA1[Family]’. These specifiers are all strung together in an AND fashion, so a result will only appear if it matches all of the criteria users have given. Currently the keyword search only allows exact match and does not allow partial match and wildcard, which will be considered in the future.</p>
<p>A help page is designed to provide all necessary information for browsing, querying, downloading and searching the website and the database.</p>
</sec>
</sec>
<sec><title>Use cases</title>
<p>If users want to retrieve all CAZyme proteins of <italic>A. </italic>
<italic>thaliana</italic>
, there will be three options. (i) Users can go to the download page, browse by species and locate the species to download the FASTA format sequences of full-length proteins or just the CAZyme domains. (ii) They can also go to the homepage, browse by species, click on the species and link to the family browse page of <italic>A.</italic>
<italic> thaliana</italic>
. There they can view which CAZyme families are in <italic>A.</italic>
<italic> thaliana</italic>
 and how many genes are in each family, as well as a clickable genomic location plot. This <italic>Arabidopsis thaliana</italic>
 browse page also has a link to the complete HMMER output, where hits that did not pass our filters (coverage > 0.3 and <italic>E</italic>
-value < 1<italic>e</italic>
-5) can also be retrieved. Clicking on each family will present a new page with the list of proteins of that family, and further clicking on the ID will open the protein browse page. (iii) The last way is to perform a keyword search in the following format: <italic>(Arabidopsis thaliana)[species]</italic>
 or <italic>Arabidopsis[Species] thaliana[Species]</italic>
, which will return a table with all the <italic>Arabidopsis thaliana</italic>
 CAZyme IDs.</p>
<p>Similarly, if users want to retrieve CAZyme proteins of a specific family, say GT8, they will have the three options too: (i) download all GT8 proteins at the download page, (ii) browse by family at the homepage and (iii) use the keyword search function: <italic>GT8[family]</italic>
.</p>
<p>If users have a dataset (e.g. a newly sequenced genome) to be annotated for CAZymes, they can upload the FASTA sequences to our computing server through the BLAST page or the annotate (HMMER) page. The job will be run and the result will be returned with the CAZyme match information. If a huge dataset (>5000 sequences) needs to be processed, we recommend that users download the BLAST databases (CAZyDB or PlantCAZyme) or the HMM database (dbCAN) at our download page and run the searches on their local computers.</p>
<sec><title>Future work</title>
<p>We plan to update the database at least once a year. We plan to include more species in the future, particularly selected plants and algae that do not have completed genomes. We will use transcriptomes of species such as ferns, liverworts, charophytic green algae (CGA), basal angiosperms, as they are important for the evolutionary study of CAZymes in plants and algae. The automatic collection of CAZyme sequences will also be further improved, e.g. by considering applying different parsing thresholds for different plant clades and by supplementing the HMMER search with BLAST search. We will also develop new web applications to display duplicated genes and orthologous genes of CAZymes on the chromosomes to allow comparative and evolutionary study of CAZymes.</p>
<p><italic>PlantCAZyme</italic>
 is the first web resource dedicated to provide pre-computed CAZyme sequence and annotation data for all sequenced plants and algae. We expect it will be a highly useful tool to the plant cell wall and bioenergy research communities.</p>
</sec>
</sec>
</body>
<back><ack><title>Acknowledgements</title>
<p>A.E. computed most of the data and implemented the database and website. R.T. contributed to the data collection. N.M. developed an early version of the database. Y.Y. conceived the database, supervised the entire project, computed some of the data and wrote the paper. The authors acknowledge the Department of Computer Science of NIU for providing free access to the Linux computing cluster Gaea and our lab members for helpful discussions. The authors thank all the reviewers for their good suggestions to improve this article.</p>
</ack>
<sec><title>Funding</title>
<p>Y.Y. was funded by the <funding-source>Research & Artistry Award</funding-source>
 and the startup package from <funding-source>Northern Illinois University</funding-source>
. A.E. and N.M. were supported by <funding-source>Undergraduate Research Assistantships</funding-source>
. Funding for open access charge: Northern Illinois University Libraries.</p>
<p><italic>Conflict of interest</italic>
. None declared.</p>
</sec>
<ref-list><title>References</title>
<ref id="bau079-B1"><label>1</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rubin</surname>
<given-names>E.M.</given-names>
</name>
</person-group>
 (<year>2008</year>
) <article-title>Genomics of cellulosic biofuels</article-title>
. <source>Nature</source>
, <volume>454</volume>
, <fpage>841</fpage>
–<lpage>845</lpage>
<pub-id pub-id-type="pmid">18704079</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B2"><label>2</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Himmel</surname>
<given-names>M.E.</given-names>
</name>
<name><surname>Ding</surname>
<given-names>S.Y.</given-names>
</name>
<name><surname>Johnson</surname>
<given-names>D.K.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2007</year>
) <article-title>Biomass recalcitrance: engineering plants and enzymes for biofuels production</article-title>
. <source>Science</source>
, <volume>315</volume>
, <fpage>804</fpage>
–<lpage>807</lpage>
<pub-id pub-id-type="pmid">17289988</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B3"><label>3</label>
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Yin</surname>
<given-names>Y.</given-names>
</name>
</person-group>
 (<year>2014</year>
) In: <person-group person-group-type="editor"><name><surname>Gupta</surname>
<given-names>V.</given-names>
</name>
<name><surname>Tuohy</surname>
<given-names>M.</given-names>
</name>
<name><surname>Kubicek</surname>
<given-names>C.</given-names>
</name>
<name><surname>Saddler</surname>
<given-names>J.</given-names>
</name>
<name><surname>Xu</surname>
<given-names>F.</given-names>
</name>
</person-group>
 (eds). <source>Bioenergy Research: Advances and Applications</source>
. <publisher-name>Elsevier BV</publisher-name>
, <publisher-loc>The Netherlands</publisher-loc>
, pp. <fpage>95</fpage>
–<lpage>107</lpage>
</mixed-citation>
</ref>
<ref id="bau079-B4"><label>4</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cantarel</surname>
<given-names>B.L.</given-names>
</name>
<name><surname>Coutinho</surname>
<given-names>P.M.</given-names>
</name>
<name><surname>Rancurel</surname>
<given-names>C.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2009</year>
) <article-title>The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>37</volume>
, <fpage>D233D238.</fpage>
<pub-id pub-id-type="pmid">18838391</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B5"><label>5</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Coutinho</surname>
<given-names>P.M.</given-names>
</name>
<name><surname>Stam</surname>
<given-names>M.</given-names>
</name>
<name><surname>Blanc</surname>
<given-names>E.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2003</year>
) <article-title>Why are there so many carbohydrate-active enzyme-related genes in plants?</article-title>
<source>Trend Plant Sci.</source>
<italic>,</italic>
<volume>8</volume>
, <fpage>563565.</fpage>
</mixed-citation>
</ref>
<ref id="bau079-B6"><label>6</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lombard</surname>
<given-names>V.</given-names>
</name>
<name><surname>Golaconda Ramulu</surname>
<given-names>H.</given-names>
</name>
<name><surname>Drula</surname>
<given-names>E.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2014</year>
) <article-title>The carbohydrate-active enzymes database (CAZy) in 2013</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>42</volume>
, <fpage>D490</fpage>
–<lpage>D495</lpage>
<pub-id pub-id-type="pmid">24270786</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B7"><label>7</label>
<mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Coutinho</surname>
<given-names>P.M.</given-names>
</name>
<name><surname>Henrissat</surname>
<given-names>B.</given-names>
</name>
</person-group>
 (<year>2010</year>
), <source>Annual Plant Reviews</source>
. <publisher-loc>New Jersey, United States</publisher-loc>
: <publisher-name>Wiley-Blackwell</publisher-name>
, pp. <fpage>93</fpage>
–<lpage>107</lpage>
</mixed-citation>
</ref>
<ref id="bau079-B8"><label>8</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yin</surname>
<given-names>Y.B.</given-names>
</name>
<name><surname>Mao</surname>
<given-names>X.Z.</given-names>
</name>
<name><surname>Yang</surname>
<given-names>J.C.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2012</year>
) <article-title>dbCAN: a web resource for automated carbohydrate-active enzyme annotation</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>40</volume>
, <fpage>W445</fpage>
–<lpage>W451</lpage>
<pub-id pub-id-type="pmid">22645317</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B9"><label>9</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mao</surname>
<given-names>F.L.</given-names>
</name>
<name><surname>Yin</surname>
<given-names>Y.B.</given-names>
</name>
<name><surname>Zhou</surname>
<given-names>F.F.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2009</year>
) <article-title>pDAWG: An integrated database for plant cell wall genes</article-title>
. <source>Bioenerg Res.</source>
, <volume>2</volume>
, <fpage>209</fpage>
–<lpage>216</lpage>
</mixed-citation>
</ref>
<ref id="bau079-B10"><label>10</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cao</surname>
<given-names>P.J.</given-names>
</name>
<name><surname>Bartley</surname>
<given-names>L.E.</given-names>
</name>
<name><surname>Jung</surname>
<given-names>K.H.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2008</year>
) <article-title>Construction of a rice glycosyltransferase phylogenomic database and identification of rice-diverged glycosyltransferases</article-title>
. <source>Mol. Plant</source>
, <volume>1</volume>
, <fpage>858</fpage>
–<lpage>877</lpage>
<pub-id pub-id-type="pmid">19825588</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B11"><label>11</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yong</surname>
<given-names>W.</given-names>
</name>
<name><surname>Link</surname>
<given-names>B.</given-names>
</name>
<name><surname>O'Malley</surname>
<given-names>R.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2005</year>
) <article-title>Genomics of plant cell wall biogenesis</article-title>
. <source>Planta</source>
, <volume>221</volume>
, <fpage>747</fpage>
–<lpage>751</lpage>
<pub-id pub-id-type="pmid">15981004</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B12"><label>12</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Girke</surname>
<given-names>T.</given-names>
</name>
<name><surname>Lauricha</surname>
<given-names>J.</given-names>
</name>
<name><surname>Tran</surname>
<given-names>H.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2004</year>
) <article-title>The cell wall navigator database. A systems-based approach to organism-unrestricted mining of protein families involved in cell wall metabolism</article-title>
. <source>Plant Physiol.</source>
, <volume>136</volume>
, <fpage>3003</fpage>
–<lpage>3008</lpage>
<pub-id pub-id-type="pmid">15489283</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B13"><label>13</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Goodstein</surname>
<given-names>D.M.</given-names>
</name>
<name><surname>Shu</surname>
<given-names>S.</given-names>
</name>
<name><surname>Howson</surname>
<given-names>R.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2012</year>
) <article-title>Phytozome: a comparative platform for green plant genomics</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>40</volume>
, <fpage>D1178</fpage>
–<lpage>D1186</lpage>
<pub-id pub-id-type="pmid">22110026</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B14"><label>14</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nystedt</surname>
<given-names>B.</given-names>
</name>
<name><surname>Street</surname>
<given-names>N.R.</given-names>
</name>
<name><surname>Wetterbom</surname>
<given-names>A.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2013</year>
) <article-title>The Norway spruce genome sequence and conifer genome evolution</article-title>
. <source>Nature</source>
, <volume>497</volume>
, <fpage>579</fpage>
–<lpage>584</lpage>
<pub-id pub-id-type="pmid">23698360</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B15"><label>15</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Eddy</surname>
<given-names>S.R.</given-names>
</name>
</person-group>
 (<year>2011</year>
) <article-title>Accelerated profile HMM searches</article-title>
. <source>PLoS Computat. Biol.</source>
, <volume>7</volume>
, <fpage>e1002195.</fpage>
</mixed-citation>
</ref>
<ref id="bau079-B16"><label>16</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Henrissat</surname>
<given-names>B.</given-names>
</name>
<name><surname>Coutinho</surname>
<given-names>P.M.</given-names>
</name>
<name><surname>Davies</surname>
<given-names>G.J.</given-names>
</name>
</person-group>
 (<year>2001</year>
) <article-title>A census of carbohydrate-active enzymes in the genome of Arabidopsis thaliana</article-title>
. <source>Plant Mol. Biol.</source>
, <volume>47</volume>
, <fpage>55</fpage>
–<lpage>72</lpage>
<pub-id pub-id-type="pmid">11554480</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B17"><label>17</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Geisler-Lee</surname>
<given-names>J.</given-names>
</name>
<name><surname>Geisler</surname>
<given-names>M.</given-names>
</name>
<name><surname>Coutinho</surname>
<given-names>P.M.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2006</year>
) <article-title>Poplar carbohydrate-active enzymes. Gene identification and expression analyses</article-title>
. <source>Plant Physiol.</source>
, <volume>140</volume>
, <fpage>946</fpage>
–<lpage>962</lpage>
<pub-id pub-id-type="pmid">16415215</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B18"><label>18</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Park</surname>
<given-names>B.H.</given-names>
</name>
<name><surname>Karpinets</surname>
<given-names>T.V.</given-names>
</name>
<name><surname>Syed</surname>
<given-names>M.H.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2010</year>
) <article-title>CAZymes Analysis Toolkit (CAT): Web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database</article-title>
. <source>Glycobiology</source>
, <volume>20</volume>
, <fpage>1574</fpage>
–<lpage>1584</lpage>
<pub-id pub-id-type="pmid">20696711</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B19"><label>19</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Finn</surname>
<given-names>R.D.</given-names>
</name>
<name><surname>Bateman</surname>
<given-names>A.</given-names>
</name>
<name><surname>Clements</surname>
<given-names>J.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2014</year>
) <article-title>Pfam: the protein families database</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>42</volume>
, <fpage>D222</fpage>
–<lpage>D230</lpage>
<pub-id pub-id-type="pmid">24288371</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B20"><label>20</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tatusov</surname>
<given-names>R.L.</given-names>
</name>
<name><surname>Fedorova</surname>
<given-names>N.D.</given-names>
</name>
<name><surname>Jackson</surname>
<given-names>J.D.</given-names>
</name>
<etal></etal>
</person-group>
 (<year>2003</year>
) <article-title>The COG database: an updated version includes eukaryotes</article-title>
. <source>BMC Bioinformatics</source>
, <volume>4</volume>
, <fpage>41</fpage>
<pub-id pub-id-type="pmid">12969510</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B21"><label>21</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
<name><surname>Goto</surname>
<given-names>S.</given-names>
</name>
<name><surname>Sato</surname>
<given-names>Y.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2012</year>
) <article-title>KEGG for integration and interpretation of large-scale molecular data sets</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>40</volume>
, <fpage>D109</fpage>
–<lpage>D114</lpage>
<pub-id pub-id-type="pmid">22080510</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B22"><label>22</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gough</surname>
<given-names>J.</given-names>
</name>
<name><surname>Chothia</surname>
<given-names>C.</given-names>
</name>
</person-group>
 (<year>2002</year>
) <article-title>SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>30</volume>
, <fpage>268</fpage>
–<lpage>272</lpage>
<pub-id pub-id-type="pmid">11752312</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B23"><label>23</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mi</surname>
<given-names>H.Y.</given-names>
</name>
<name><surname>Muruganujan</surname>
<given-names>A.</given-names>
</name>
<name><surname>Thomas</surname>
<given-names>P.D.</given-names>
</name>
</person-group>
 (<year>2013</year>
) <article-title>PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>41</volume>
, <fpage>D377</fpage>
–<lpage>D386</lpage>
<pub-id pub-id-type="pmid">23193289</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B24"><label>24</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ashburner</surname>
<given-names>M.</given-names>
</name>
<name><surname>Ball</surname>
<given-names>C.A.</given-names>
</name>
<name><surname>Blake</surname>
<given-names>J.A.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2000</year>
) <article-title>Gene ontology: tool for the unification of biology</article-title>
. <source>Nat Genet</source>
, <volume>25</volume>
, <fpage>25</fpage>
–<lpage>29</lpage>
<pub-id pub-id-type="pmid">10802651</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B25"><label>25</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hunter</surname>
<given-names>S.</given-names>
</name>
<name><surname>Apweiler</surname>
<given-names>R.</given-names>
</name>
<name><surname>Attwood</surname>
<given-names>T.K.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2009</year>
) <article-title>InterPro: the integrative protein signature database</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>37</volume>
, <fpage>D211</fpage>
–<lpage>D215</lpage>
<pub-id pub-id-type="pmid">18940856</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B26"><label>26</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Marchler-Bauer</surname>
<given-names>A.</given-names>
</name>
<name><surname>Lu</surname>
<given-names>S.N.</given-names>
</name>
<name><surname>Anderson</surname>
<given-names>J.B.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2011</year>
) <article-title>CDD: a conserved domain database for the functional annotation of proteins</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>39</volume>
, <fpage>D225</fpage>
–<lpage>D229</lpage>
<pub-id pub-id-type="pmid">21109532</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B27"><label>27</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ware</surname>
<given-names>D.</given-names>
</name>
<name><surname>Jaiswal</surname>
<given-names>P.</given-names>
</name>
<name><surname>Ni</surname>
<given-names>J.J.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2002</year>
) <article-title>Gramene: a resource for comparative grass genomics</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>30</volume>
, <fpage>103</fpage>
–<lpage>105</lpage>
<pub-id pub-id-type="pmid">11752266</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B28"><label>28</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Van Bel</surname>
<given-names>M.</given-names>
</name>
<name><surname>Proost</surname>
<given-names>S.</given-names>
</name>
<name><surname>Wischnitzki</surname>
<given-names>E.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2012</year>
) <article-title>Dissecting plant genomes with the PLAZA comparative genomics platform</article-title>
. <source>Plant Physiol.</source>
, <volume>158</volume>
, <fpage>590</fpage>
–<lpage>600</lpage>
<pub-id pub-id-type="pmid">22198273</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B29"><label>29</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bairoch</surname>
<given-names>A.</given-names>
</name>
</person-group>
 (<year>2000</year>
) <article-title>The ENZYME database in 2000</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>28</volume>
, <fpage>304</fpage>
–<lpage>305</lpage>
<pub-id pub-id-type="pmid">10592255</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B30"><label>30</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Claudel-Renard</surname>
<given-names>C.</given-names>
</name>
<name><surname>Chevalet</surname>
<given-names>C.</given-names>
</name>
<name><surname>Faraut</surname>
<given-names>T.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2003</year>
) <article-title>Enzyme-specific profiles for genome annotation: PRIAM</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>31</volume>
, <fpage>6633</fpage>
–<lpage>6639</lpage>
<pub-id pub-id-type="pmid">14602924</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B31"><label>31</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname>
<given-names>C.G.</given-names>
</name>
<name><surname>Zavaijevski</surname>
<given-names>N.</given-names>
</name>
<name><surname>Desai</surname>
<given-names>V.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2009</year>
) <article-title>Genome-wide enzyme annotation with precision control: Catalytic families [CatFam] databases</article-title>
. <source>Proteins</source>
, <volume>74</volume>
, <fpage>449</fpage>
–<lpage>460</lpage>
<pub-id pub-id-type="pmid">18636476</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B32"><label>32</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tian</surname>
<given-names>W.D.</given-names>
</name>
<name><surname>Arakaki</surname>
<given-names>A.K.</given-names>
</name>
<name><surname>Skolnick</surname>
<given-names>J.</given-names>
</name>
</person-group>
 (<year>2004</year>
) <article-title>EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>32</volume>
, <fpage>6226</fpage>
–<lpage>6239</lpage>
<pub-id pub-id-type="pmid">15576349</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B33"><label>33</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mueller</surname>
<given-names>L.A.</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>P.F.</given-names>
</name>
<name><surname>Rhee</surname>
<given-names>S.Y.</given-names>
</name>
</person-group>
 (<year>2003</year>
) <article-title>AraCyc: a biochemical pathway database for Arabidopsis</article-title>
. <source>Plant Physiol.</source>
, <volume>132</volume>
, <fpage>453</fpage>
–<lpage>460</lpage>
<pub-id pub-id-type="pmid">12805578</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B34"><label>34</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname>
<given-names>A.Y.</given-names>
</name>
<name><surname>Chen</surname>
<given-names>X.</given-names>
</name>
<name><surname>Gao</surname>
<given-names>G.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2008</year>
) <article-title>PlantTFDB: a comprehensive plant transcription factor database</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>36</volume>
, <fpage>D966</fpage>
–<lpage>D969</lpage>
<pub-id pub-id-type="pmid">17933783</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B35"><label>35</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fawal</surname>
<given-names>N.</given-names>
</name>
<name><surname>Li</surname>
<given-names>Q.</given-names>
</name>
<name><surname>Savelli</surname>
<given-names>B.</given-names>
</name>
<etal></etal>
</person-group>
 (<year>2013</year>
) <article-title>PeroxiBase: a database for large-scale evolutionary analysis of peroxidases</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>41</volume>
, <fpage>D441</fpage>
–<lpage>4</lpage>
<pub-id pub-id-type="pmid">23180785</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B36"><label>36</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Saier</surname>
<given-names>M.H.</given-names>
<suffix>Jr.</suffix>
</name>
<name><surname>Tran</surname>
<given-names>C.V.</given-names>
</name>
<name><surname>Barabote</surname>
<given-names>R.D.</given-names>
</name>
</person-group>
 (<year>2006</year>
) <article-title>TCDB: the transporter classification database for membrane transport protein analyses and information</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>34</volume>
, <fpage>D181</fpage>
–<lpage>D186</lpage>
<pub-id pub-id-type="pmid">16381841</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B37"><label>37</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rawlings</surname>
<given-names>N.D.</given-names>
</name>
<name><surname>Barrett</surname>
<given-names>A.J.</given-names>
</name>
<name><surname>Bateman</surname>
<given-names>A.</given-names>
</name>
</person-group>
 (<year>2012</year>
) <article-title>MEROPS: the database of proteolytic enzymes, their substrates and inhibitors</article-title>
. <source>Nucleic Acids Res.</source>
, <volume>40</volume>
, <fpage>D343</fpage>
–<lpage>D350</lpage>
<pub-id pub-id-type="pmid">22086950</pub-id>
</mixed-citation>
</ref>
<ref id="bau079-B38"><label>38</label>
<mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sukharnikov</surname>
<given-names>L.O.</given-names>
</name>
<name><surname>Cantwell</surname>
<given-names>B.J.</given-names>
</name>
<name><surname>Podar</surname>
<given-names>M.</given-names>
</name>
<etal></etal>
</person-group>
<italic>.</italic>
 (<year>2011</year>
) <article-title>Cellulases: ambiguous nonhomologous enzymes in a genomic perspective</article-title>
. <source>Trends Biotechnol.</source>
, <volume>29</volume>
, <fpage>473</fpage>
–<lpage>479</lpage>
<pub-id pub-id-type="pmid">21683463</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Bois/explor/OrangerV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000E800 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000E800 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Bois
   |area=    OrangerV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

This area was generated with Dilib version V0.6.25.
Data generation: Sat Dec 3 17:11:04 2016. Site generation: Wed Mar 6 18:18:32 2024

	Serveur d'exploration sur l'oranger
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'oranger

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri