Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

MEGANTE: A Web-Based System for Integrated Plant Genome Annotation

Identifieur interne : 000647 ( Pmc/Corpus ); précédent : 000646; suivant : 000648

MEGANTE: A Web-Based System for Integrated Plant Genome Annotation

Auteurs : Hisataka Numa ; Takeshi Itoh

Source :

RBID : PMC:3894707

Abstract

The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the Brassicaceae, Fabaceae, Musaceae, Poaceae, Salicaceae, Solanaceae, Rosaceae and Vitaceae families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at https://megante.dna.affrc.go.jp/.


Url:
DOI: 10.1093/pcp/pct157
PubMed: 24253915
PubMed Central: 3894707

Links to Exploration step

PMC:3894707

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">MEGANTE: A Web-Based System for Integrated Plant Genome Annotation</title>
<author>
<name sortKey="Numa, Hisataka" sort="Numa, Hisataka" uniqKey="Numa H" first="Hisataka" last="Numa">Hisataka Numa</name>
</author>
<author>
<name sortKey="Itoh, Takeshi" sort="Itoh, Takeshi" uniqKey="Itoh T" first="Takeshi" last="Itoh">Takeshi Itoh</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24253915</idno>
<idno type="pmc">3894707</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3894707</idno>
<idno type="RBID">PMC:3894707</idno>
<idno type="doi">10.1093/pcp/pct157</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000647</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">MEGANTE: A Web-Based System for Integrated Plant Genome Annotation</title>
<author>
<name sortKey="Numa, Hisataka" sort="Numa, Hisataka" uniqKey="Numa H" first="Hisataka" last="Numa">Hisataka Numa</name>
</author>
<author>
<name sortKey="Itoh, Takeshi" sort="Itoh, Takeshi" uniqKey="Itoh T" first="Takeshi" last="Itoh">Takeshi Itoh</name>
</author>
</analytic>
<series>
<title level="j">Plant and Cell Physiology</title>
<idno type="ISSN">0032-0781</idno>
<idno type="eISSN">1471-9053</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the
<italic>Brassicaceae</italic>
,
<italic>Fabaceae</italic>
,
<italic>Musaceae</italic>
,
<italic>Poaceae</italic>
,
<italic>Salicaceae</italic>
,
<italic>Solanaceae</italic>
,
<italic>Rosaceae</italic>
and
<italic>Vitaceae</italic>
families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at
<ext-link ext-link-type="uri" xlink:href="https://megante.dna.affrc.go.jp/">https://megante.dna.affrc.go.jp/</ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Allen, Je" uniqKey="Allen J">JE Allen</name>
</author>
<author>
<name sortKey="Majoros, Wh" uniqKey="Majoros W">WH Majoros</name>
</author>
<author>
<name sortKey="Pertea, M" uniqKey="Pertea M">M Pertea</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Amano, N" uniqKey="Amano N">N Amano</name>
</author>
<author>
<name sortKey="Tanaka, T" uniqKey="Tanaka T">T Tanaka</name>
</author>
<author>
<name sortKey="Numa, H" uniqKey="Numa H">H Numa</name>
</author>
<author>
<name sortKey="Sakai, H" uniqKey="Sakai H">H Sakai</name>
</author>
<author>
<name sortKey="Itoh, T" uniqKey="Itoh T">T Itoh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author>
<name sortKey="Ball, Ca" uniqKey="Ball C">CA Ball</name>
</author>
<author>
<name sortKey="Blake, Ja" uniqKey="Blake J">JA Blake</name>
</author>
<author>
<name sortKey="Botstein, D" uniqKey="Botstein D">D Botstein</name>
</author>
<author>
<name sortKey="Butler, H" uniqKey="Butler H">H Butler</name>
</author>
<author>
<name sortKey="Cherry, Jm" uniqKey="Cherry J">JM Cherry</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bevan, Mw" uniqKey="Bevan M">MW Bevan</name>
</author>
<author>
<name sortKey="Uauy, C" uniqKey="Uauy C">C Uauy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bombarely, A" uniqKey="Bombarely A">A Bombarely</name>
</author>
<author>
<name sortKey="Menda, N" uniqKey="Menda N">N Menda</name>
</author>
<author>
<name sortKey="Tecle, Iy" uniqKey="Tecle I">IY Tecle</name>
</author>
<author>
<name sortKey="Buels, Rm" uniqKey="Buels R">RM Buels</name>
</author>
<author>
<name sortKey="Strickler, S" uniqKey="Strickler S">S Strickler</name>
</author>
<author>
<name sortKey="Fischer York, T" uniqKey="Fischer York T">T Fischer-York</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Camacho, C" uniqKey="Camacho C">C Camacho</name>
</author>
<author>
<name sortKey="Coulouris, G" uniqKey="Coulouris G">G Coulouris</name>
</author>
<author>
<name sortKey="Avagyan, V" uniqKey="Avagyan V">V Avagyan</name>
</author>
<author>
<name sortKey="Ma, N" uniqKey="Ma N">N Ma</name>
</author>
<author>
<name sortKey="Papadopoulos, J" uniqKey="Papadopoulos J">J Papadopoulos</name>
</author>
<author>
<name sortKey="Bealer, K" uniqKey="Bealer K">K Bealer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cantarel, Bl" uniqKey="Cantarel B">BL Cantarel</name>
</author>
<author>
<name sortKey="Korf, I" uniqKey="Korf I">I Korf</name>
</author>
<author>
<name sortKey="Robb, Sm" uniqKey="Robb S">SM Robb</name>
</author>
<author>
<name sortKey="Parra, G" uniqKey="Parra G">G Parra</name>
</author>
<author>
<name sortKey="Ross, E" uniqKey="Ross E">E Ross</name>
</author>
<author>
<name sortKey="Moore, B" uniqKey="Moore B">B Moore</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, T" uniqKey="Chen T">T Chen</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Zhao, L" uniqKey="Zhao L">L Zhao</name>
</author>
<author>
<name sortKey="Zhu, Z" uniqKey="Zhu Z">Z Zhu</name>
</author>
<author>
<name sortKey="Lin, J" uniqKey="Lin J">J Lin</name>
</author>
<author>
<name sortKey="Zhang, S" uniqKey="Zhang S">S Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Choulet, F" uniqKey="Choulet F">F Choulet</name>
</author>
<author>
<name sortKey="Wicker, T" uniqKey="Wicker T">T Wicker</name>
</author>
<author>
<name sortKey="Rustenholz, C" uniqKey="Rustenholz C">C Rustenholz</name>
</author>
<author>
<name sortKey="Paux, E" uniqKey="Paux E">E Paux</name>
</author>
<author>
<name sortKey="Salse, J" uniqKey="Salse J">J Salse</name>
</author>
<author>
<name sortKey="Leroy, P" uniqKey="Leroy P">P Leroy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cooper, L" uniqKey="Cooper L">L Cooper</name>
</author>
<author>
<name sortKey="Walls, Rl" uniqKey="Walls R">RL Walls</name>
</author>
<author>
<name sortKey="Elser, J" uniqKey="Elser J">J Elser</name>
</author>
<author>
<name sortKey="Gandolfo, Ma" uniqKey="Gandolfo M">MA Gandolfo</name>
</author>
<author>
<name sortKey="Stevenson, Dw" uniqKey="Stevenson D">DW Stevenson</name>
</author>
<author>
<name sortKey="Smith, B" uniqKey="Smith B">B Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Droc, G" uniqKey="Droc G">G Droc</name>
</author>
<author>
<name sortKey="Lariviere, D" uniqKey="Lariviere D">D Larivière</name>
</author>
<author>
<name sortKey="Guignon, V" uniqKey="Guignon V">V Guignon</name>
</author>
<author>
<name sortKey="Yahiaoui, N" uniqKey="Yahiaoui N">N Yahiaoui</name>
</author>
<author>
<name sortKey="This, D" uniqKey="This D">D This</name>
</author>
<author>
<name sortKey="Garsmeur, O" uniqKey="Garsmeur O">O Garsmeur</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Duvick, J" uniqKey="Duvick J">J Duvick</name>
</author>
<author>
<name sortKey="Fu, A" uniqKey="Fu A">A Fu</name>
</author>
<author>
<name sortKey="Muppirala, U" uniqKey="Muppirala U">U Muppirala</name>
</author>
<author>
<name sortKey="Sabharwal, M" uniqKey="Sabharwal M">M Sabharwal</name>
</author>
<author>
<name sortKey="Wilkerson, Md" uniqKey="Wilkerson M">MD Wilkerson</name>
</author>
<author>
<name sortKey="Lawrence, Cj" uniqKey="Lawrence C">CJ Lawrence</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goff, Sa" uniqKey="Goff S">SA Goff</name>
</author>
<author>
<name sortKey="Vaughn, M" uniqKey="Vaughn M">M Vaughn</name>
</author>
<author>
<name sortKey="Mckay, S" uniqKey="Mckay S">S McKay</name>
</author>
<author>
<name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
<author>
<name sortKey="Stapleton, Ae" uniqKey="Stapleton A">AE Stapleton</name>
</author>
<author>
<name sortKey="Gessler, D" uniqKey="Gessler D">D Gessler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goodstein, Dm" uniqKey="Goodstein D">DM Goodstein</name>
</author>
<author>
<name sortKey="Shu, S" uniqKey="Shu S">S Shu</name>
</author>
<author>
<name sortKey="Howson, R" uniqKey="Howson R">R Howson</name>
</author>
<author>
<name sortKey="Neupane, R" uniqKey="Neupane R">R Neupane</name>
</author>
<author>
<name sortKey="Hayes, Rd" uniqKey="Hayes R">RD Hayes</name>
</author>
<author>
<name sortKey="Fazo, J" uniqKey="Fazo J">J Fazo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haas, Bj" uniqKey="Haas B">BJ Haas</name>
</author>
<author>
<name sortKey="Delcher, Al" uniqKey="Delcher A">AL Delcher</name>
</author>
<author>
<name sortKey="Mount, Sm" uniqKey="Mount S">SM Mount</name>
</author>
<author>
<name sortKey="Wortman, Jr" uniqKey="Wortman J">JR Wortman</name>
</author>
<author>
<name sortKey="Smith, Rk" uniqKey="Smith R">RK Smith</name>
</author>
<author>
<name sortKey="Hannick, Li" uniqKey="Hannick L">LI Hannick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hunter, S" uniqKey="Hunter S">S Hunter</name>
</author>
<author>
<name sortKey="Jones, P" uniqKey="Jones P">P Jones</name>
</author>
<author>
<name sortKey="Mitchell, A" uniqKey="Mitchell A">A Mitchell</name>
</author>
<author>
<name sortKey="Apweiler, R" uniqKey="Apweiler R">R Apweiler</name>
</author>
<author>
<name sortKey="Attwood, Tk" uniqKey="Attwood T">TK Attwood</name>
</author>
<author>
<name sortKey="Bateman, A" uniqKey="Bateman A">A Bateman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jaillon, O" uniqKey="Jaillon O">O Jaillon</name>
</author>
<author>
<name sortKey="Aury, Jm" uniqKey="Aury J">JM Aury</name>
</author>
<author>
<name sortKey="Noel, B" uniqKey="Noel B">B Noel</name>
</author>
<author>
<name sortKey="Policriti, A" uniqKey="Policriti A">A Policriti</name>
</author>
<author>
<name sortKey="Clepet, C" uniqKey="Clepet C">C Clepet</name>
</author>
<author>
<name sortKey="Casagrande, A" uniqKey="Casagrande A">A Casagrande</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jung, S" uniqKey="Jung S">S Jung</name>
</author>
<author>
<name sortKey="Staton, M" uniqKey="Staton M">M Staton</name>
</author>
<author>
<name sortKey="Lee, T" uniqKey="Lee T">T Lee</name>
</author>
<author>
<name sortKey="Blenda, A" uniqKey="Blenda A">A Blenda</name>
</author>
<author>
<name sortKey="Svancara, R" uniqKey="Svancara R">R Svancara</name>
</author>
<author>
<name sortKey="Abbott, A" uniqKey="Abbott A">A Abbott</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kent, Wj" uniqKey="Kent W">WJ Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Korf, I" uniqKey="Korf I">I Korf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lamesch, P" uniqKey="Lamesch P">P Lamesch</name>
</author>
<author>
<name sortKey="Berardini, Tz" uniqKey="Berardini T">TZ Berardini</name>
</author>
<author>
<name sortKey="Li, D" uniqKey="Li D">D Li</name>
</author>
<author>
<name sortKey="Swarbreck, D" uniqKey="Swarbreck D">D Swarbreck</name>
</author>
<author>
<name sortKey="Wilks, C" uniqKey="Wilks C">C Wilks</name>
</author>
<author>
<name sortKey="Sasidharan, R" uniqKey="Sasidharan R">R Sasidharan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leroy, P" uniqKey="Leroy P">P Leroy</name>
</author>
<author>
<name sortKey="Guilhot, N" uniqKey="Guilhot N">N Guilhot</name>
</author>
<author>
<name sortKey="Sakai, H" uniqKey="Sakai H">H Sakai</name>
</author>
<author>
<name sortKey="Bernard, A" uniqKey="Bernard A">A Bernard</name>
</author>
<author>
<name sortKey="Choulet, F" uniqKey="Choulet F">F Choulet</name>
</author>
<author>
<name sortKey="Theil, S" uniqKey="Theil S">S Theil</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lukashin, Av" uniqKey="Lukashin A">AV Lukashin</name>
</author>
<author>
<name sortKey="Borodovsky, M" uniqKey="Borodovsky M">M Borodovsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Magrane, M" uniqKey="Magrane M">M Magrane</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Majoros, Wh" uniqKey="Majoros W">WH Majoros</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mihara, M" uniqKey="Mihara M">M Mihara</name>
</author>
<author>
<name sortKey="Itoh, T" uniqKey="Itoh T">T Itoh</name>
</author>
<author>
<name sortKey="Izawa, T" uniqKey="Izawa T">T Izawa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nagamura, Y" uniqKey="Nagamura Y">Y Nagamura</name>
</author>
<author>
<name sortKey="Antonio, Ba" uniqKey="Antonio B">BA Antonio</name>
</author>
<author>
<name sortKey="Sato, Y" uniqKey="Sato Y">Y Sato</name>
</author>
<author>
<name sortKey="Miyao, A" uniqKey="Miyao A">A Miyao</name>
</author>
<author>
<name sortKey="Namiki, N" uniqKey="Namiki N">N Namiki</name>
</author>
<author>
<name sortKey="Yonemaru, J" uniqKey="Yonemaru J">J Yonemaru</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nakamura, Y" uniqKey="Nakamura Y">Y Nakamura</name>
</author>
<author>
<name sortKey="Cochrane, G" uniqKey="Cochrane G">G Cochrane</name>
</author>
<author>
<name sortKey="Karsch Mizrachi, I" uniqKey="Karsch Mizrachi I">I Karsch-Mizrachi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nussbaumer, T" uniqKey="Nussbaumer T">T Nussbaumer</name>
</author>
<author>
<name sortKey="Martis, Mm" uniqKey="Martis M">MM Martis</name>
</author>
<author>
<name sortKey="Roessner, Sk" uniqKey="Roessner S">SK Roessner</name>
</author>
<author>
<name sortKey="Pfeifer, M" uniqKey="Pfeifer M">M Pfeifer</name>
</author>
<author>
<name sortKey="Bader, Kc" uniqKey="Bader K">KC Bader</name>
</author>
<author>
<name sortKey="Sharma, S" uniqKey="Sharma S">S Sharma</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pavy, N" uniqKey="Pavy N">N Pavy</name>
</author>
<author>
<name sortKey="Rombauts, S" uniqKey="Rombauts S">S Rombauts</name>
</author>
<author>
<name sortKey="Dehais, P" uniqKey="Dehais P">P Déhais</name>
</author>
<author>
<name sortKey="Mathe, C" uniqKey="Mathe C">C Mathé</name>
</author>
<author>
<name sortKey="Ramana, Dv" uniqKey="Ramana D">DV Ramana</name>
</author>
<author>
<name sortKey="Leroy, P" uniqKey="Leroy P">P Leroy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Podicheti, R" uniqKey="Podicheti R">R Podicheti</name>
</author>
<author>
<name sortKey="Gollapudi, R" uniqKey="Gollapudi R">R Gollapudi</name>
</author>
<author>
<name sortKey="Dong, Q" uniqKey="Dong Q">Q Dong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Quevillon, E" uniqKey="Quevillon E">E Quevillon</name>
</author>
<author>
<name sortKey="Silventoinen, V" uniqKey="Silventoinen V">V Silventoinen</name>
</author>
<author>
<name sortKey="Pillai, S" uniqKey="Pillai S">S Pillai</name>
</author>
<author>
<name sortKey="Harte, N" uniqKey="Harte N">N Harte</name>
</author>
<author>
<name sortKey="Mulder, N" uniqKey="Mulder N">N Mulder</name>
</author>
<author>
<name sortKey="Apweiler, R" uniqKey="Apweiler R">R Apweiler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rogic, S" uniqKey="Rogic S">S Rogic</name>
</author>
<author>
<name sortKey="Mackworth, Ak" uniqKey="Mackworth A">AK Mackworth</name>
</author>
<author>
<name sortKey="Ouellette, Fb" uniqKey="Ouellette F">FB Ouellette</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rouard, M" uniqKey="Rouard M">M Rouard</name>
</author>
<author>
<name sortKey="Guignon, V" uniqKey="Guignon V">V Guignon</name>
</author>
<author>
<name sortKey="Aluome, C" uniqKey="Aluome C">C Aluome</name>
</author>
<author>
<name sortKey="Laporte, Ma" uniqKey="Laporte M">MA Laporte</name>
</author>
<author>
<name sortKey="Droc, G" uniqKey="Droc G">G Droc</name>
</author>
<author>
<name sortKey="Walde, C" uniqKey="Walde C">C Walde</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sakai, H" uniqKey="Sakai H">H Sakai</name>
</author>
<author>
<name sortKey="Lee, Ss" uniqKey="Lee S">SS Lee</name>
</author>
<author>
<name sortKey="Tanaka, T" uniqKey="Tanaka T">T Tanaka</name>
</author>
<author>
<name sortKey="Numa, H" uniqKey="Numa H">H Numa</name>
</author>
<author>
<name sortKey="Kim, J" uniqKey="Kim J">J Kim</name>
</author>
<author>
<name sortKey="Kawahara, Y" uniqKey="Kawahara Y">Y Kawahara</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sakata, K" uniqKey="Sakata K">K Sakata</name>
</author>
<author>
<name sortKey="Nagamura, Y" uniqKey="Nagamura Y">Y Nagamura</name>
</author>
<author>
<name sortKey="Numa, H" uniqKey="Numa H">H Numa</name>
</author>
<author>
<name sortKey="Antonio, Ba" uniqKey="Antonio B">BA Antonio</name>
</author>
<author>
<name sortKey="Nagasaki, H" uniqKey="Nagasaki H">H Nagasaki</name>
</author>
<author>
<name sortKey="Idonuma, A" uniqKey="Idonuma A">A Idonuma</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salamov, Aa" uniqKey="Salamov A">AA Salamov</name>
</author>
<author>
<name sortKey="Solovyev, Vv" uniqKey="Solovyev V">VV Solovyev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sayers, Ew" uniqKey="Sayers E">EW Sayers</name>
</author>
<author>
<name sortKey="Barrett, T" uniqKey="Barrett T">T Barrett</name>
</author>
<author>
<name sortKey="Benson, Da" uniqKey="Benson D">DA Benson</name>
</author>
<author>
<name sortKey="Bolton, E" uniqKey="Bolton E">E Bolton</name>
</author>
<author>
<name sortKey="Bryant, Sh" uniqKey="Bryant S">SH Bryant</name>
</author>
<author>
<name sortKey="Canese, K" uniqKey="Canese K">K Canese</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stanke, M" uniqKey="Stanke M">M Stanke</name>
</author>
<author>
<name sortKey="Waack, S" uniqKey="Waack S">S Waack</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stein, Ld" uniqKey="Stein L">LD Stein</name>
</author>
<author>
<name sortKey="Mungall, C" uniqKey="Mungall C">C Mungall</name>
</author>
<author>
<name sortKey="Shu, S" uniqKey="Shu S">S Shu</name>
</author>
<author>
<name sortKey="Caudy, M" uniqKey="Caudy M">M Caudy</name>
</author>
<author>
<name sortKey="Mangone, M" uniqKey="Mangone M">M Mangone</name>
</author>
<author>
<name sortKey="Day, A" uniqKey="Day A">A Day</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Walenz, B" uniqKey="Walenz B">B Walenz</name>
</author>
<author>
<name sortKey="Florea, L" uniqKey="Florea L">L Florea</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xu, J" uniqKey="Xu J">J Xu</name>
</author>
<author>
<name sortKey="Wang, B" uniqKey="Wang B">B Wang</name>
</author>
<author>
<name sortKey="Wu, Y" uniqKey="Wu Y">Y Wu</name>
</author>
<author>
<name sortKey="Du, P" uniqKey="Du P">P Du</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Wang, M" uniqKey="Wang M">M Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yandell, M" uniqKey="Yandell M">M Yandell</name>
</author>
<author>
<name sortKey="Ence, D" uniqKey="Ence D">D Ence</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yao, H" uniqKey="Yao H">H Yao</name>
</author>
<author>
<name sortKey="Guo, L" uniqKey="Guo L">L Guo</name>
</author>
<author>
<name sortKey="Fu, Y" uniqKey="Fu Y">Y Fu</name>
</author>
<author>
<name sortKey="Borsuk, La" uniqKey="Borsuk L">LA Borsuk</name>
</author>
<author>
<name sortKey="Wen, Tj" uniqKey="Wen T">TJ Wen</name>
</author>
<author>
<name sortKey="Skibbe, Ds" uniqKey="Skibbe D">DS Skibbe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Youens Clark, K" uniqKey="Youens Clark K">K Youens-Clark</name>
</author>
<author>
<name sortKey="Buckler, E" uniqKey="Buckler E">E Buckler</name>
</author>
<author>
<name sortKey="Casstevens, T" uniqKey="Casstevens T">T Casstevens</name>
</author>
<author>
<name sortKey="Chen, C" uniqKey="Chen C">C Chen</name>
</author>
<author>
<name sortKey="Declerck, G" uniqKey="Declerck G">G Declerck</name>
</author>
<author>
<name sortKey="Derwent, P" uniqKey="Derwent P">P Derwent</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Plant Cell Physiol</journal-id>
<journal-id journal-id-type="iso-abbrev">Plant Cell Physiol</journal-id>
<journal-id journal-id-type="publisher-id">pcp</journal-id>
<journal-id journal-id-type="hwp">pcellphys</journal-id>
<journal-title-group>
<journal-title>Plant and Cell Physiology</journal-title>
</journal-title-group>
<issn pub-type="ppub">0032-0781</issn>
<issn pub-type="epub">1471-9053</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24253915</article-id>
<article-id pub-id-type="pmc">3894707</article-id>
<article-id pub-id-type="doi">10.1093/pcp/pct157</article-id>
<article-id pub-id-type="publisher-id">pct157</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Special Online Collection – Database Papers</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>MEGANTE: A Web-Based System for Integrated Plant Genome Annotation</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Numa</surname>
<given-names>Hisataka</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Itoh</surname>
<given-names>Takeshi</given-names>
</name>
<xref ref-type="corresp" rid="pct157-COR1">*</xref>
</contrib>
<aff>Agrogenomics Research Center, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan</aff>
</contrib-group>
<author-notes>
<corresp id="pct157-COR1">*Corresponding author: E-mail,
<email>taitoh@affrc.go.jp</email>
; Fax,
<fax>+81-29-838-7065</fax>
.</corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>1</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>17</day>
<month>12</month>
<year>2013</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>17</day>
<month>12</month>
<year>2013</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>55</volume>
<issue>1</issue>
<fpage>e2</fpage>
<lpage>e2</lpage>
<history>
<date date-type="received">
<day>21</day>
<month>8</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>18</day>
<month>10</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author 2013. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.</copyright-statement>
<copyright-year>2013</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by-nc/3.0/">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/3.0/">http://creativecommons.org/licenses/by-nc/3.0/</ext-link>
), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com</license-p>
</license>
</permissions>
<abstract>
<p>The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the
<italic>Brassicaceae</italic>
,
<italic>Fabaceae</italic>
,
<italic>Musaceae</italic>
,
<italic>Poaceae</italic>
,
<italic>Salicaceae</italic>
,
<italic>Solanaceae</italic>
,
<italic>Rosaceae</italic>
and
<italic>Vitaceae</italic>
families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at
<ext-link ext-link-type="uri" xlink:href="https://megante.dna.affrc.go.jp/">https://megante.dna.affrc.go.jp/</ext-link>
.</p>
</abstract>
<kwd-group>
<kwd>Gene prediction</kwd>
<kwd>Plant genome annotation</kwd>
<kwd>Web service</kwd>
</kwd-group>
<counts>
<page-count count="8"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec>
<title>Introduction</title>
<p>With the advent of high-throughput sequencing technologies, plant genome sequencing has been accelerated, and the data are being utilized for crop improvement (
<xref ref-type="bibr" rid="pct157-B4">Bevan and Uauy 2013</xref>
). The accumulation of the large amount of plant genome sequences led to constructions of comparative genomics databases (
<xref ref-type="bibr" rid="pct157-B26">Mihara et al. 2010</xref>
,
<xref ref-type="bibr" rid="pct157-B27">Nagamura et al. 2011</xref>
,
<xref ref-type="bibr" rid="pct157-B34">Rouard et al. 2011</xref>
,
<xref ref-type="bibr" rid="pct157-B14">Goodstein et al. 2012</xref>
) and development of a plant-specific controlled vocabulary for effective data integration (
<xref ref-type="bibr" rid="pct157-B10">Cooper et al. 2013</xref>
). However, the costs of the data management and analyses are increasing because of the need for high-spec computers, huge amounts of data storage and expertise in both computer science and molecular biology. In these data analyses, genome annotation is one of the most fundamental and indispensable steps (
<xref ref-type="bibr" rid="pct157-B43">Yandell and Ence 2012</xref>
), directly affecting further studies such as molecular evolutionary analyses, transposon tagging and microarray experiments. The annotation procedures require a higher level of bioinformatics skill, as several analysis programs must be conducted followed by the integration of the results to predict gene structures and assign gene functions. Thus, an easy-to-use annotation platform, which does not require any expertise in bioinformatics, would be essential for researchers to perform genome annotation and to visualize the results on a graphical viewer to interpret the annotation.</p>
<p>Currently, several types of analysis tools are available online for plant genome annotation. For example, online versions of ab initio gene prediction programs, such as AUGUSTUS (
<xref ref-type="bibr" rid="pct157-B39">Stanke and Waack 2003</xref>
), Fgenesh (
<xref ref-type="bibr" rid="pct157-B37">Salamov and Solovyev 2000</xref>
) and GeneMark.hmm (
<xref ref-type="bibr" rid="pct157-B23">Lukashin and Borodovsky 1998</xref>
), can be used to find open reading frames (ORFs) from genomic sequences. FPGP (
<xref ref-type="bibr" rid="pct157-B2">Amano et al. 2010</xref>
) aligns full-length cDNA (FLcDNA) sequences of dicot and monocot plants to a query sequence. Gramene (
<xref ref-type="bibr" rid="pct157-B45">Youens-Clark et al. 2011</xref>
) and PlantGDB (
<xref ref-type="bibr" rid="pct157-B12">Duvick et al. 2008</xref>
) provide a web service for a similarity search against plant nucleotide or protein databases. For graphical representation of the analysis results, WebGBrowse (
<xref ref-type="bibr" rid="pct157-B31">Podicheti et al. 2009</xref>
) is a good candidate. Although such web services are useful for genome annotation, it is time-consuming for researchers to access multiple web sites and interpret their results one by one. Moreover, such an annotation procedure is difficult for non-bioinformaticians to select the appropriate tools and parameter sets for the input sequences. Therefore, an integrated analysis tool to execute a series of analysis programs automatically is required to support genome analyses such as positional cloning of plant genes (
<xref ref-type="bibr" rid="pct157-B8">Chen et al. 2009</xref>
,
<xref ref-type="bibr" rid="pct157-B42">Xu et al. 2011</xref>
).</p>
<p>Several web-based annotation pipelines are available for plant genome sequences. Some of them are designed for specific plant genome annotation; RiceGAAS (
<xref ref-type="bibr" rid="pct157-B36">Sakata et al
<italic>.</italic>
2002</xref>
) is for rice and TriAnnot (
<xref ref-type="bibr" rid="pct157-B22">Leroy et al. 2012</xref>
) is for wheat. There are also more versatile genome annotation tools that can be adapted not only for plants but also for other species. DNA subway (
<xref ref-type="bibr" rid="pct157-B13">Goff et al. 2011</xref>
) provides parameter sets for both animals and plants. MAKER (
<xref ref-type="bibr" rid="pct157-B7">Cantarel et al. 2008</xref>
) has a highly configurable web interface to select reference databases and parameters for analysis programs. However, there are few plant species that are supported in the existing annotation pipelines.</p>
<p>Here we describe a new plant genome annotation web service called MEGANTE that runs several analysis programs against query sequences, integrates the results and visualizes the annotation information on a genome browser. Compared with the existing tools, one of the notable features of MEGANTE is its simple interface, which is easy to use, even for non-experts. In addition, the service targets a wide variety of plant species and is able to accept large query sequences up to 10 Mb in length.</p>
</sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title>Features of the MEGANTE web service</title>
<p>At the time of first use, MEGANTE requires an e-mail address and password to create an account. MEGANTE stores all data including query sequences submitted by users and the analysis results on the server side, and stores the data until users explicitly remove them via a web interface (
<xref ref-type="fig" rid="pct157-F1">Fig. 1</xref>
A). Several analysis programs are automatically conducted in the system, but users do not need to specify any parameters or reference databases for the annotation process. Users can start annotation simply by copying and pasting a genomic sequence and then selecting the species of the query from a drop-down list (
<xref ref-type="fig" rid="pct157-F1">Fig. 1</xref>
B). Currently, the service supports 24 species from the eight plant families shown in
<xref ref-type="table" rid="pct157-T1">Table 1</xref>
. Multiple sequences in FASTA format are acceptable. The length of each query sequence is limited to 10 Mb, and users can save up to 100 sequences in the server. These limitations are due to our current hardware resources that are available for this service.
<fig id="pct157-F1" position="float">
<label>Fig. 1</label>
<caption>
<p>Screenshots of the MEGANTE web interface. (A) Uploaded queries are listed. The list allows users to see the statuses of annotation jobs, download analysis results and jump to an annotation viewer. Clicking the sequence ID shows or hides detailed information about the query sequence. (B) Users can submit query sequences through this interface. (C) Annotation viewer with GBrowse. Users can select which data tracks to show or hide on the annotation map with the ‘Select Tracks’ tab. (D) Detailed annotation information of the predicted genes linked from the data tracks in GBrowse. ORF and amino acid sequences are also shown on this page.</p>
</caption>
<graphic xlink:href="pct157f1p"></graphic>
</fig>
<table-wrap id="pct157-T1" position="float">
<label>Table 1</label>
<caption>
<p>Species supported in MEGANTE</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Families</th>
<th rowspan="1" colspan="1">Species</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="4" colspan="1">
<italic>Brassicaceae</italic>
</td>
<td rowspan="1" colspan="1">
<italic>Arabidopsis thaliana</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Brassica napus</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Brassica rapa</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Raphanus sativus</italic>
</td>
</tr>
<tr>
<td rowspan="4" colspan="1">
<italic>Fabaceae</italic>
</td>
<td rowspan="1" colspan="1">
<italic>Glycine max</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Lotus japonicus</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Medicago truncatula</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Vigna unguiculata</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Musaceae</italic>
</td>
<td rowspan="1" colspan="1">
<italic>Musa acuminata</italic>
</td>
</tr>
<tr>
<td rowspan="7" colspan="1">
<italic>Poaceae</italic>
</td>
<td rowspan="1" colspan="1">
<italic>Brachypodium distachyon</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Hordeum vulgare</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Oryza sativa</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Phyllostachys edulis</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Sorghum bicolor</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Triticum aestivum</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Zea mays</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Salicaceae</italic>
</td>
<td rowspan="1" colspan="1">
<italic>Populus trichocarpa</italic>
</td>
</tr>
<tr>
<td rowspan="4" colspan="1">
<italic>Solanaceae</italic>
</td>
<td rowspan="1" colspan="1">
<italic>Nicotiana tabacum</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Solanum lycopersicum</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Solanum melongena</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Solanum tuberosum</italic>
</td>
</tr>
<tr>
<td rowspan="2" colspan="1">
<italic>Rosaceae</italic>
</td>
<td rowspan="1" colspan="1">
<italic>Malus × domestica</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Prunus persica</italic>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<italic>Vitaceae</italic>
</td>
<td rowspan="1" colspan="1">
<italic>Vitis vinifera</italic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>The uploaded sequences are first queued, and then the queries are processed on an application server in a round-robin fashion to schedule the processes fairly. In the current system, five annotation processes can run in parallel. The whole annotation process can be completed within approximately 150 min for a 1 Mb sequence and approximately 15 h for a 10 Mb sequence. After finishing the annotation process, the following results are reported: repeat elements; alignments of transcript and protein sequences; predicted gene structures; similarities to known proteins; functional domains; and Gene Ontology (GO) terms (
<xref ref-type="bibr" rid="pct157-B3">Ashburner et al. 2000</xref>
). All the results are visualized with a widely used genome browser, GBrowse (
<xref ref-type="bibr" rid="pct157-B40">Stein et al. 2002</xref>
), which is integrated into the system. Furthermore, the system archives the annotation results in a single ZIP file for download. The file contains the annotation information in both Microsoft Excel and GFF3 (
<ext-link ext-link-type="uri" xlink:href="http://www.sequenceontoloty.org/gff3.html">http://www.sequenceontoloty.org/gff3.html</ext-link>
) formats. If users select an option for e-mail notification when submitting a query, an e-mail is sent to notify the users upon completion of the annotation. The data transfer between web browsers and the server is protected by SSL encryption.</p>
</sec>
<sec>
<title>Reference databases used in the system and pre-processing for annotation</title>
<p>MEGANTE uses several reference databases for the genome annotation, which include FLcDNAs and expressed sequence tags (ESTs) obtained from INSDC (
<xref ref-type="bibr" rid="pct157-B28">Nakamura et al. 2013</xref>
), protein sequences from Swiss-Prot and the TrEMBL plant division of UniProtKB (
<xref ref-type="bibr" rid="pct157-B24">Magrane and UniProt Consortium 2011</xref>
), and a protein family and domain database, InterPro (Hunter et al. 2011). We update the databases on a regular basis. Up-to-date details of the databases, such as the number of sequences, are described on the MEGANTE web site. After retrieving FLcDNAs and ESTs for each species listed in
<xref ref-type="table" rid="pct157-T1">Table 1</xref>
, we run a SeqClean script (
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/seqclean/">http://sourceforge.net/projects/seqclean/</ext-link>
) to remove poly(A) tails, vectors, low complexities and short sequences from the transcripts.</p>
</sec>
<sec>
<title>Annotation workflow</title>
<p>The overall annotation workflow is shown in
<xref ref-type="fig" rid="pct157-F2">Fig. 2</xref>
. The annotation process begins with filtering out repeat elements detected by RepeatMasker (
<ext-link ext-link-type="uri" xlink:href="http://repeatmasker.org">http://repeatmasker.org</ext-link>
) with the MIPS Repeat Element Database (
<xref ref-type="bibr" rid="pct157-B29">Nussbaumer et al. 2013</xref>
). Next, to predict the exon–intron structures, the system aligns intraspecies FLcDNAs to a query sequence using BLAT (
<xref ref-type="bibr" rid="pct157-B19">Kent 2002</xref>
) with a cut-off of ≥98% identity and coverage. Although intraspecies FLcDNAs are effective for accurate gene prediction, in many cases the number of the sequences is not sufficient to cover entire genes. For this reason, we also use AUGUSTUS (
<xref ref-type="bibr" rid="pct157-B39">Stanke and Waack 2003</xref>
), GeneZilla (
<xref ref-type="bibr" rid="pct157-B1">Allen et al. 2006</xref>
), GlimmerHMM (
<xref ref-type="bibr" rid="pct157-B1">Allen et al. 2006</xref>
) and SNAP (
<xref ref-type="bibr" rid="pct157-B20">Korf 2004</xref>
) for ab initio gene prediction, ProSplign (
<xref ref-type="bibr" rid="pct157-B38">Sayers et al. 2012</xref>
) for protein alignment with SwissProt and TrEMBL, and sim4db (
<xref ref-type="bibr" rid="pct157-B41">Walenz and Florea 2011</xref>
) for interspecies FLcDNA alignment. In the sim4db alignment, only interspecies FLcDNAs from the same class (monocot or dicot) of plants are used to reduce the calculation time. The cut-off identity and coverage of the protein alignment are set to 90%. For gene prediction of the
<italic>Musaceae</italic>
and
<italic>Rosaceae</italic>
families, which have a relatively small number of sequences in protein databases, the values are relaxed to 80% to increase the number of proteins that could be mapped to the queried sequences. It was confirmed that the relaxed condition did not decrease the gene prediction accuracy (data not shown). The system runs sim4db with an identity and coverage cut-off of 50% and then finds the longest ORFs in each locus for a downstream analysis. All the results, which contain genes predicted by the four ab initio gene finders, protein alignments and ORFs generated from the sim4db alignments, are merged to create consensus gene structures using JIGSAW (
<xref ref-type="bibr" rid="pct157-B1">Allen et al. 2006</xref>
). Simultaneously, PASA (
<xref ref-type="bibr" rid="pct157-B15">Haas et al. 2003</xref>
) generates EST assemblies by mapping intraspecies ESTs to the query sequence. To achieve a more accurate prediction, the system runs PASA again to incorporate the EST assemblies into the consensus gene structures generated by JIGSAW. Consequently, predicted genes are classified into three categories: (i) genes inferred from intraspecies FLcDNAs; (ii) genes successfully incorporated with EST assemblies; and (iii) genes that did not overlap with EST assemblies or genes that failed to be incorporated with EST assemblies because of inconsistencies between the exon–intron structures. ORFs of class (i) genes are determined by selecting the longest ORFs. This process is not required for classes (ii) and (iii) because they already contain ORF information. If no ORFs ≥100 bp are found, those sequences are treated as non-protein-coding genes. Lastly, sequence similarity against SwissProt and TrEMBL proteins is examined with blastp in BLAST+ (
<xref ref-type="bibr" rid="pct157-B6">Camacho et al. 2009</xref>
), and InterProScan (
<xref ref-type="bibr" rid="pct157-B32">Quevillon et al. 2005</xref>
) is conducted to identify functional domains and assign GO terms to the ORFs.
<fig id="pct157-F2" position="float">
<label>Fig. 2</label>
<caption>
<p>Overview of the genome annotation workflow in MEGANTE.</p>
</caption>
<graphic xlink:href="pct157f2p"></graphic>
</fig>
</p>
</sec>
<sec sec-type="results">
<title>Visualization of annotation results</title>
<p>After all the annotation procedures are completed on an application server, the results are returned back to a web server for graphical representation with GBrowse. In addition to the three classes of predicted genes that were previously mentioned, gene loci and repeat regions are displayed on the annotation map in GBrowse (
<xref ref-type="fig" rid="pct157-F1">Fig. 1</xref>
C). MEGANTE also provides other data tracks as follows: (i) protein alignments of the Swiss-Prot and TrEMBL plant division generated by ProSplign; (ii) intraspecies FLcDNA aligned by BLAT; (iii) EST assemblies generated by PASA; (iv) interspecies FLcDNA aligned by sim4db; and (v) repetitive elements detected by RepeatMasker. Users can select the tracks with the ‘Select Tracks’ tab in the window. Details of gene attributes and function annotation, such as top 10 BLAST hits to Swiss-Prot and TrEMBL, InterPro domains and GO terms, are linked from the predicted gene tracks (
<xref ref-type="fig" rid="pct157-F1">Fig. 1</xref>
D). ORF and protein sequences can also be retrieved from the same page. For secure data management, we added authentication and authorization mechanisms into the original GBrowse so that user data are not disclosed to any others.</p>
</sec>
<sec>
<title>Application to plant genome sequences</title>
<p>To show the efficiency of MEGANTE, we applied this web service to genomic sequences from
<italic>Arabidopsis thaliana</italic>
,
<italic>Glycine max</italic>
,
<italic>Musa acuminate</italic>
,
<italic>Oryza sativa</italic>
,
<italic>Populus trichocarpa</italic>
,
<italic>Solanum lycopersicum</italic>
,
<italic>Musca × domestica</italic>
and
<italic>Vitis vinifera</italic>
. Each sequence consisted of 1,000 fragments of a genome sequence, and each one contained one transcript sequence. Details of the data sets are described in the Materials and Methods. To evaluate the performance of MEGANTE, we examined the predictive accuracies for the coding sequence (CDS) coordinates by sensitivity (Sn) and specificity (Sp) that are commonly used for the evaluation of gene prediction programs (
<xref ref-type="bibr" rid="pct157-B30">Pavy et al. 1999</xref>
,
<xref ref-type="bibr" rid="pct157-B33">Rogic et al. 2001</xref>
,
<xref ref-type="bibr" rid="pct157-B44">Yao et al. 2005</xref>
). Sn is defined as the proportion of actual positives that are correctly predicted, and Sp is defined as the proportion of predicted positives that are true positives. We calculated the Sn and Sp at both the exon and gene levels. The exon level means that the start and end positions of a CDS are checked at each exon. At gene level evaluation, it is necessary to identify all CDS coordinates in a transcript correctly. All of the results are summarized in
<xref ref-type="table" rid="pct157-T2">Table 2</xref>
. For comparison, the results of individual ab initio gene predictions employed in the system are also shown in the table. Differences in predictive accuracies can be observed among the species. For instance, approximately ≥80% of CDSs for
<italic>A. thaliana</italic>
and
<italic>O. sativa</italic>
were correctly identified, while the Sns at gene level for
<italic>M. acuminate</italic>
and
<italic>M.×domestica</italic>
were much lower, approximately 10–20%. However, MEGANTE exhibited higher Sn and Sp in almost all categories in comparison with the ab initio gene finders.
<table-wrap id="pct157-T2" position="float">
<label>Table 2</label>
<caption>
<p>Predictive accuracies of MEGANTE and individual gene prediction programs used in the system</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Test gene sets</th>
<th colspan="2" align="center" rowspan="1">Evaluation categories</th>
<th rowspan="1" colspan="1">MEGANTE</th>
<th rowspan="1" colspan="1">AUGUSTUS</th>
<th rowspan="1" colspan="1">GeneZilla</th>
<th rowspan="1" colspan="1">GlimmerHMM</th>
<th rowspan="1" colspan="1">SNAP</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="4" colspan="1">
<italic>A. thaliana</italic>
</td>
<td rowspan="2" colspan="1">Exon level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">95.2</td>
<td rowspan="1" colspan="1">84.4</td>
<td rowspan="1" colspan="1">71.8</td>
<td rowspan="1" colspan="1">80.9</td>
<td rowspan="1" colspan="1">75.5</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">87.9</td>
<td rowspan="1" colspan="1">75.6</td>
<td rowspan="1" colspan="1">66.5</td>
<td rowspan="1" colspan="1">72.7</td>
<td rowspan="1" colspan="1">59.2</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Gene level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">84.2</td>
<td rowspan="1" colspan="1">58.4</td>
<td rowspan="1" colspan="1">43.6</td>
<td rowspan="1" colspan="1">49.1</td>
<td rowspan="1" colspan="1">38.7</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">58.8</td>
<td rowspan="1" colspan="1">43.4</td>
<td rowspan="1" colspan="1">28.0</td>
<td rowspan="1" colspan="1">35.2</td>
<td rowspan="1" colspan="1">22.4</td>
</tr>
<tr>
<td rowspan="4" colspan="1">
<italic>G. max</italic>
</td>
<td rowspan="2" colspan="1">Exon level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">78.8</td>
<td rowspan="1" colspan="1">75.3</td>
<td rowspan="1" colspan="1">57.5</td>
<td rowspan="1" colspan="1">56.3</td>
<td rowspan="1" colspan="1">61.8</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">86.7</td>
<td rowspan="1" colspan="1">71.1</td>
<td rowspan="1" colspan="1">54.2</td>
<td rowspan="1" colspan="1">59.1</td>
<td rowspan="1" colspan="1">54.3</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Gene level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">51.4</td>
<td rowspan="1" colspan="1">35.9</td>
<td rowspan="1" colspan="1">22.1</td>
<td rowspan="1" colspan="1">28.2</td>
<td rowspan="1" colspan="1">18.2</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">48.8</td>
<td rowspan="1" colspan="1">32.5</td>
<td rowspan="1" colspan="1">13.6</td>
<td rowspan="1" colspan="1">16.0</td>
<td rowspan="1" colspan="1">12.1</td>
</tr>
<tr>
<td rowspan="4" colspan="1">
<italic>M. acuminata</italic>
</td>
<td rowspan="2" colspan="1">Exon level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">46.1</td>
<td rowspan="1" colspan="1">48.3</td>
<td rowspan="1" colspan="1">19.9</td>
<td rowspan="1" colspan="1">34.7</td>
<td rowspan="1" colspan="1">30.5</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">62.3</td>
<td rowspan="1" colspan="1">54.4</td>
<td rowspan="1" colspan="1">28.7</td>
<td rowspan="1" colspan="1">31.5</td>
<td rowspan="1" colspan="1">36.4</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Gene level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">12.8</td>
<td rowspan="1" colspan="1">12.2</td>
<td rowspan="1" colspan="1">5.9</td>
<td rowspan="1" colspan="1">7.6</td>
<td rowspan="1" colspan="1">5.8</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">12.7</td>
<td rowspan="1" colspan="1">11.2</td>
<td rowspan="1" colspan="1">4.1</td>
<td rowspan="1" colspan="1">3.4</td>
<td rowspan="1" colspan="1">4.0</td>
</tr>
<tr>
<td rowspan="4" colspan="1">
<italic>O. sativa</italic>
</td>
<td rowspan="2" colspan="1">Exon level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">91.9</td>
<td rowspan="1" colspan="1">52.9</td>
<td rowspan="1" colspan="1">57.0</td>
<td rowspan="1" colspan="1">74.0</td>
<td rowspan="1" colspan="1">45.4</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">86.7</td>
<td rowspan="1" colspan="1">67.4</td>
<td rowspan="1" colspan="1">46.4</td>
<td rowspan="1" colspan="1">59.0</td>
<td rowspan="1" colspan="1">50.6</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Gene level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">78.0</td>
<td rowspan="1" colspan="1">29.5</td>
<td rowspan="1" colspan="1">21.9</td>
<td rowspan="1" colspan="1">37.8</td>
<td rowspan="1" colspan="1">19.7</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">57.1</td>
<td rowspan="1" colspan="1">29.3</td>
<td rowspan="1" colspan="1">12.2</td>
<td rowspan="1" colspan="1">21.6</td>
<td rowspan="1" colspan="1">15.6</td>
</tr>
<tr>
<td rowspan="4" colspan="1">
<italic>P. trichocarpa</italic>
</td>
<td rowspan="2" colspan="1">Exon level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">76.7</td>
<td rowspan="1" colspan="1">73.7</td>
<td rowspan="1" colspan="1">60.8</td>
<td rowspan="1" colspan="1">55.0</td>
<td rowspan="1" colspan="1">63.4</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">81.1</td>
<td rowspan="1" colspan="1">71.0</td>
<td rowspan="1" colspan="1">57.8</td>
<td rowspan="1" colspan="1">59.3</td>
<td rowspan="1" colspan="1">57.4</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Gene level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">32.3</td>
<td rowspan="1" colspan="1">26.8</td>
<td rowspan="1" colspan="1">19.6</td>
<td rowspan="1" colspan="1">21.4</td>
<td rowspan="1" colspan="1">14.4</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">32.0</td>
<td rowspan="1" colspan="1">25.8</td>
<td rowspan="1" colspan="1">13.4</td>
<td rowspan="1" colspan="1">13.2</td>
<td rowspan="1" colspan="1">10.7</td>
</tr>
<tr>
<td rowspan="4" colspan="1">
<italic>S. lycopersicum</italic>
</td>
<td rowspan="2" colspan="1">Exon level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">85.7</td>
<td rowspan="1" colspan="1">69.4</td>
<td rowspan="1" colspan="1">49.0</td>
<td rowspan="1" colspan="1">49.3</td>
<td rowspan="1" colspan="1">57.0</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">91.1</td>
<td rowspan="1" colspan="1">74.4</td>
<td rowspan="1" colspan="1">49.1</td>
<td rowspan="1" colspan="1">52.4</td>
<td rowspan="1" colspan="1">49.3</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Gene level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">62.6</td>
<td rowspan="1" colspan="1">29.5</td>
<td rowspan="1" colspan="1">22.0</td>
<td rowspan="1" colspan="1">26.8</td>
<td rowspan="1" colspan="1">19.3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">60.0</td>
<td rowspan="1" colspan="1">34.0</td>
<td rowspan="1" colspan="1">13.0</td>
<td rowspan="1" colspan="1">14.5</td>
<td rowspan="1" colspan="1">12.3</td>
</tr>
<tr>
<td rowspan="4" colspan="1">
<italic>M.×domestica</italic>
</td>
<td rowspan="2" colspan="1">Exon level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">59.0</td>
<td rowspan="1" colspan="1">59.1</td>
<td rowspan="1" colspan="1">47.1</td>
<td rowspan="1" colspan="1">49.8</td>
<td rowspan="1" colspan="1">42.3</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">68.5</td>
<td rowspan="1" colspan="1">62.5</td>
<td rowspan="1" colspan="1">47.3</td>
<td rowspan="1" colspan="1">50.1</td>
<td rowspan="1" colspan="1">46.1</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Gene level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">22.1</td>
<td rowspan="1" colspan="1">22.5</td>
<td rowspan="1" colspan="1">13.2</td>
<td rowspan="1" colspan="1">19.1</td>
<td rowspan="1" colspan="1">11.2</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">19.0</td>
<td rowspan="1" colspan="1">18.4</td>
<td rowspan="1" colspan="1">7.7</td>
<td rowspan="1" colspan="1">9.7</td>
<td rowspan="1" colspan="1">7.8</td>
</tr>
<tr>
<td rowspan="4" colspan="1">
<italic>V. vinifera</italic>
</td>
<td rowspan="2" colspan="1">Exon level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">61.0</td>
<td rowspan="1" colspan="1">51.4</td>
<td rowspan="1" colspan="1">46.8</td>
<td rowspan="1" colspan="1">35.8</td>
<td rowspan="1" colspan="1">41.7</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">83.7</td>
<td rowspan="1" colspan="1">51.2</td>
<td rowspan="1" colspan="1">38.5</td>
<td rowspan="1" colspan="1">31.0</td>
<td rowspan="1" colspan="1">36.0</td>
</tr>
<tr>
<td rowspan="2" colspan="1">Gene level</td>
<td rowspan="1" colspan="1">Sn (%)</td>
<td rowspan="1" colspan="1">22.7</td>
<td rowspan="1" colspan="1">10.5</td>
<td rowspan="1" colspan="1">7.1</td>
<td rowspan="1" colspan="1">6.6</td>
<td rowspan="1" colspan="1">5.0</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Sp (%)</td>
<td rowspan="1" colspan="1">27.3</td>
<td rowspan="1" colspan="1">7.9</td>
<td rowspan="1" colspan="1">3.5</td>
<td rowspan="1" colspan="1">2.6</td>
<td rowspan="1" colspan="1">2.6</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="pct157-TF1">
<p>Gene prediction parameters we used for each target species are described in the Materials and Methods.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>Furthermore, we ran MEGANTE against 13 genome contigs from wheat chromosome 3B (
<xref ref-type="bibr" rid="pct157-B9">Choulet et al. 2010</xref>
), which were used in the evaluation of a wheat genome annotation pipeline, TriAnnot (
<xref ref-type="bibr" rid="pct157-B22">Leroy et al. 2012</xref>
). The overall size of the contigs is approximately 18 Mb, and they contain 172 CDSs. We used the same evaluation as previously described. The results revealed that the Sn and Sp were 77.6% and 88.2% at the exon level and 64.5% and 63.5% at the gene level, respectively. The results could not be directly compared with the values described in the study on TriAnnot because the numbers of contigs and genes used for the evaluation were not identical between these two. However, both the Sn and Sp of MEGANTE were comparable with those of TriAnnot.</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<p>In this article, we introduced MEGANTE, a web service for integrated plant genome annotation. The interface of MEGANTE is designed mainly for non-bioinformatics researchers. Complex configurations for annotation procedures are not required; therefore, users can perform genome annotation simply by copying and pasting genomic sequences and selecting the species they want to query. Graphical representation is important for quickly interpreting the analysis results. We utilized GBrowse (
<xref ref-type="bibr" rid="pct157-B40">Stein et al. 2002</xref>
) for data visualization because this viewer is widely used in several plant genome databases (
<xref ref-type="bibr" rid="pct157-B14">Goodstein et al. 2012</xref>
,
<xref ref-type="bibr" rid="pct157-B21">Lamesch et al
<italic>.</italic>
2012</xref>
,
<xref ref-type="bibr" rid="pct157-B35">Sakai et al
<italic>.</italic>
2013</xref>
) and users should be familiar with its interface.</p>
<p>MEGANTE has unique features that are not found in similar services. For instance, the service is able to accept a query sequence with a length of 10 Mb, which is larger than the other services. Another prominent feature of MEGANTE is that it targets a wide variety of plant species including 24 species from eight families. This was made possible by adapting common parameter sets for gene structure prediction for all species in the same family. For example, the system creates consensus gene structures for the
<italic>Poaceae</italic>
family by using JIGSAW (
<xref ref-type="bibr" rid="pct157-B1">Allen et al. 2006</xref>
) with a parameter matrix generated from a reference gene set from
<italic>O. sativa</italic>
. Although it is generally preferable to optimize gene prediction parameters for a particular species, the number of reference genes with high-quality annotation is not large enough for parameter optimization, and enriched annotation of closely related species shows much better performance for this application. In fact, our evaluation of wheat genome sequence annotation using MEGANTE revealed that the prediction parameters for
<italic>O. sativa</italic>
were sufficient for wheat in our annotation workflow.</p>
<p>One important point to be considered is the updates of data, including reference databases and parameter files for gene prediction in the system. We plan to update the transcript and protein databases at least twice a year, and regenerate gene prediction parameters when new reference annotation data are released. Most of the methods used to generate parameter files for gene prediction are automated in the system, and thus the overall procedure for one species can be completed within a week. Furthermore, it is possible to adapt MEGANTE to any other species by collecting transcript sequences or optimizing gene prediction parameters with reference annotation data for the species of interest.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and Methods</title>
<sec>
<title>Compiling reference gene sets for gene prediction</title>
<p>To optimize gene prediction parameters for each species, annotated genes are required as reference gene sets. We use the term [training] to refer to the optimization. We collected
<italic>A. thaliana</italic>
gene sets from the
<italic>Brassicaceae</italic>
family,
<italic>G. max</italic>
from
<italic>Fabaceae</italic>
,
<italic>M. acuminate</italic>
from
<italic>Musaceae</italic>
,
<italic>O. sativa</italic>
from
<italic>Poaceae</italic>
,
<italic>P. trichocarpa</italic>
from
<italic>Salicaceae</italic>
,
<italic>S. lycopersicum</italic>
from
<italic>Solanaceae</italic>
,
<italic>M.×domestica</italic>
from
<italic>Rosaceae</italic>
, and
<italic>V. vinifera</italic>
from
<italic>Vitaceae</italic>
. The system uses the same parameter sets for gene prediction of all species in the same family, while transcript sequences used for alignment are distinct from each other. The
<italic>A. thaliana</italic>
gene set was retrieved from representative genes in TAIR10 (
<xref ref-type="bibr" rid="pct157-B21">Lamesch et al
<italic>.</italic>
2012</xref>
);
<italic>G. max</italic>
and
<italic>P. trichocarpa</italic>
from Phytozome v9.0 (
<xref ref-type="bibr" rid="pct157-B14">Goodstein et al. 2012</xref>
);
<italic>M. acuminate</italic>
from The Banana Genome Hub version 1 (
<xref ref-type="bibr" rid="pct157-B11">Droc et al. 2013</xref>
);
<italic>O. sativa</italic>
from representative genes in RAP-DB IRGSP 1.0 (
<xref ref-type="bibr" rid="pct157-B35">Sakai et al
<italic>.</italic>
2013</xref>
);
<italic>S. lycopersicum</italic>
from ITAG2.3 in SGN (
<xref ref-type="bibr" rid="pct157-B5">Bombarely et al. 2011</xref>
);
<italic>M.×domestica</italic>
from v1.0p assembly and annotation in GDR (
<xref ref-type="bibr" rid="pct157-B18">Jung et al. 2008</xref>
); and
<italic>V. vinifera</italic>
from the 12X version of genome assembly and annotation in Grape Genome Browser (
<xref ref-type="bibr" rid="pct157-B17">Jaillon et al. 2007</xref>
). First, we excluded genes that did not begin with an initiation codon or did not end with a stop codon in each gene set. Then, we randomly selected 1,000 genes for training of ab initio gene prediction programs, 10,000 for training of JIGSAW and 1,000 for an evaluation of the overall performance of MEGANTE. The data sets for training and evaluation did not overlap with each other so that valid evaluation between independent sets was possible. All of the gene sequences extracted from genome assemblies contain CDSs and their 1 kb upstream and downstream sequences.</p>
</sec>
<sec>
<title>Training of gene prediction programs</title>
<p>We initially trained ab initio gene finders, AUGUSTUS (
<xref ref-type="bibr" rid="pct157-B39">Stanke and Waack 2003</xref>
), GeneZilla (
<xref ref-type="bibr" rid="pct157-B1">Allen et al. 2006</xref>
), GlimmerHMM (
<xref ref-type="bibr" rid="pct157-B1">Allen et al. 2006</xref>
) and SNAP (
<xref ref-type="bibr" rid="pct157-B20">Korf 2004</xref>
), with each gene set. However, the gene finders had pre-trained parameter files for some plant species; thus, we did not train the programs for those species. The pre-trained parameters we used were AUGUSTUS for Arabidopsis, maize and tomato; GlimmerHMM for Arabidopsis and rice; and SNAP for Arabidopsis and rice. AUGUSTUS for maize is used for the
<italic>Poaceae</italic>
family. For GeneZilla, we employed an automatic training program, GRAPE (
<xref ref-type="bibr" rid="pct157-B25">Majoros and Salzberg 2004</xref>
). To train the other programs, we followed the instructions provided by the software developer. Subsequently, training of JIGSAW with all of the ab initio gene finders was conducted using 10,000 CDSs. In addition, interspecies FLcDNA alignment with sim4db and protein alignment with ProSplign were also conducted against the same data sets. These procedures were the same as previously described for the annotation workflow. All the results were fed to JIGSAW as sources of evidence. The training of JIGSAW was performed by the train_jigsaw.pl script in the package with default options.</p>
</sec>
<sec>
<title>Availability and implementation of the system</title>
<p>MEGANTE was implemented in a web application framework, Catalyst, with MySQL as the backend database. The frontend web interface was built with HTML5, CSS3 and JavaScript, and was tested on the following web browsers: Safari 6, Chrome 28, Firefox 23 and Internet Explorer 9 and 10. This service is available for free at
<ext-link ext-link-type="uri" xlink:href="https://megante.dna.affrc.go.jp/">https://megante.dna.affrc.go.jp/</ext-link>
.</p>
</sec>
</sec>
<sec>
<title>Funding</title>
<p>This work was supported by the
<funding-source>Ministry of Agriculture</funding-source>
,
<funding-source>Forestry and Fisheries of Japan</funding-source>
[
<funding-source>Genomics for Agricultural Innovation</funding-source>
, grant
<award-id>GIR1001</award-id>
; Development of Genome Information Database System for Innovation of Crop and Livestock Production].</p>
</sec>
<sec>
<title>Disclosures</title>
<p>The authors have no conflicts of interest to declare.</p>
</sec>
</body>
<back>
<glossary>
<def-list>
<title>Abbreviations</title>
<def-item>
<term id="G1">CDS</term>
<def>
<p>coding sequence</p>
</def>
</def-item>
<def-item>
<term id="G2">EST</term>
<def>
<p>expressed sequence tag</p>
</def>
</def-item>
<def-item>
<term id="G3">FLcDNA</term>
<def>
<p>full-length cDNA</p>
</def>
</def-item>
<def-item>
<term id="G4">GO</term>
<def>
<p>gene ontology</p>
</def>
</def-item>
<def-item>
<term id="G5">ORF</term>
<def>
<p>open reading frame</p>
</def>
</def-item>
<def-item>
<term id="G6">Sn</term>
<def>
<p>sensitivity</p>
</def>
</def-item>
<def-item>
<term id="G7">Sp</term>
<def>
<p>specificity</p>
</def>
</def-item>
</def-list>
</glossary>
<ref-list>
<title>References</title>
<ref id="pct157-B1">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Allen</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Majoros</surname>
<given-names>WH</given-names>
</name>
<name>
<surname>Pertea</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions</article-title>
<source>Genome Biol.</source>
<year>2006</year>
<volume>7</volume>
<issue>Suppl. 1</issue>
<fpage>S9.1</fpage>
<lpage>S9.13</lpage>
<pub-id pub-id-type="pmid">16925843</pub-id>
</element-citation>
</ref>
<ref id="pct157-B2">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amano</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Tanaka</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Numa</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Sakai</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Itoh</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Efficient plant gene identification based on interspecies mapping of full-length cDNAs</article-title>
<source>DNA Res.</source>
<year>2010</year>
<volume>17</volume>
<fpage>271</fpage>
<lpage>279</lpage>
<pub-id pub-id-type="pmid">20668003</pub-id>
</element-citation>
</ref>
<ref id="pct157-B3">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ball</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Blake</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Botstein</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Butler</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Cherry</surname>
<given-names>JM</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium</article-title>
<source>Nat. Genet.</source>
<year>2000</year>
<volume>25</volume>
<fpage>25</fpage>
<lpage>29</lpage>
<pub-id pub-id-type="pmid">10802651</pub-id>
</element-citation>
</ref>
<ref id="pct157-B4">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bevan</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Uauy</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Genomics reveals new landscapes for crop improvement</article-title>
<source>Genome Biol.</source>
<year>2013</year>
<volume>14</volume>
<fpage>206</fpage>
<pub-id pub-id-type="pmid">23796126</pub-id>
</element-citation>
</ref>
<ref id="pct157-B5">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bombarely</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Menda</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Tecle</surname>
<given-names>IY</given-names>
</name>
<name>
<surname>Buels</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Strickler</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fischer-York</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The Sol Genomics Network (solgenomics.net): growing tomatoes using Perl</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>D1149</fpage>
<lpage>D1155</lpage>
<pub-id pub-id-type="pmid">20935049</pub-id>
</element-citation>
</ref>
<ref id="pct157-B6">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Camacho</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Coulouris</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Avagyan</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Papadopoulos</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bealer</surname>
<given-names>K</given-names>
</name>
<etal></etal>
</person-group>
<article-title>BLAST+: architecture and applications</article-title>
<source>BMC Bioinformatics</source>
<year>2009</year>
<volume>10</volume>
<fpage>421</fpage>
<pub-id pub-id-type="pmid">20003500</pub-id>
</element-citation>
</ref>
<ref id="pct157-B7">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cantarel</surname>
<given-names>BL</given-names>
</name>
<name>
<surname>Korf</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Robb</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Parra</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Ross</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes</article-title>
<source>Genome Res.</source>
<year>2008</year>
<volume>18</volume>
<fpage>188</fpage>
<lpage>196</lpage>
<pub-id pub-id-type="pmid">18025269</pub-id>
</element-citation>
</ref>
<ref id="pct157-B8">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Fine mapping and candidate gene analysis of a green-revertible albino gene gra(t) in rice</article-title>
<source>J. Genet. Genomics</source>
<year>2009</year>
<volume>36</volume>
<fpage>117</fpage>
<lpage>123</lpage>
<pub-id pub-id-type="pmid">19232310</pub-id>
</element-citation>
</ref>
<ref id="pct157-B9">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Choulet</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Wicker</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rustenholz</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Paux</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Salse</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Leroy</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces</article-title>
<source>Plant Cell</source>
<year>2010</year>
<volume>22</volume>
<fpage>1686</fpage>
<lpage>1701</lpage>
<pub-id pub-id-type="pmid">20581307</pub-id>
</element-citation>
</ref>
<ref id="pct157-B10">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cooper</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Walls</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Elser</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Gandolfo</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Stevenson</surname>
<given-names>DW</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The plant ontology as a tool for comparative plant anatomy and genomic analyses</article-title>
<source>Plant Cell Physiol.</source>
<year>2013</year>
<volume>54</volume>
<fpage>e1</fpage>
<pub-id pub-id-type="pmid">23220694</pub-id>
</element-citation>
</ref>
<ref id="pct157-B11">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Droc</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Larivière</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Guignon</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Yahiaoui</surname>
<given-names>N</given-names>
</name>
<name>
<surname>This</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Garsmeur</surname>
<given-names>O</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The banana genome hub</article-title>
<source>Database (Oxford)</source>
<year>2013</year>
<volume>2013</volume>
<fpage>bat035</fpage>
<pub-id pub-id-type="pmid">23707967</pub-id>
</element-citation>
</ref>
<ref id="pct157-B12">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Duvick</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Muppirala</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Sabharwal</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wilkerson</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Lawrence</surname>
<given-names>CJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>PlantGDB: a resource for comparative plant genomics</article-title>
<source>Nucleic Acids Res.</source>
<year>2008</year>
<volume>36</volume>
<fpage>D959</fpage>
<lpage>D965</lpage>
<pub-id pub-id-type="pmid">18063570</pub-id>
</element-citation>
</ref>
<ref id="pct157-B13">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goff</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Vaughn</surname>
<given-names>M</given-names>
</name>
<name>
<surname>McKay</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lyons</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Stapleton</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Gessler</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The iPlant Collaborative: cyberinfrastructure for plant biology</article-title>
<source>Front. Plant Sci.</source>
<year>2011</year>
<volume>2</volume>
<fpage>34</fpage>
<pub-id pub-id-type="pmid">22645531</pub-id>
</element-citation>
</ref>
<ref id="pct157-B14">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goodstein</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Shu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Howson</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Neupane</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hayes</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Fazo</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Phytozome: a comparative platform for green plant genomics</article-title>
<source>Nucleic Acids Res.</source>
<year>2012</year>
<volume>40</volume>
<fpage>D1178</fpage>
<lpage>D1186</lpage>
<pub-id pub-id-type="pmid">22110026</pub-id>
</element-citation>
</ref>
<ref id="pct157-B15">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haas</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Delcher</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Mount</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Wortman</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Hannick</surname>
<given-names>LI</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies</article-title>
<source>Nucleic Acids Res.</source>
<year>2003</year>
<volume>31</volume>
<fpage>5654</fpage>
<lpage>5666</lpage>
<pub-id pub-id-type="pmid">14500829</pub-id>
</element-citation>
</ref>
<ref id="pct157-B16">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hunter</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Mitchell</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Apweiler</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Attwood</surname>
<given-names>TK</given-names>
</name>
<name>
<surname>Bateman</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>InterPro in 2011: new developments in the family and domain prediction database</article-title>
<source>Nucleic Acids Res.</source>
<year>2012</year>
<volume>40</volume>
<fpage>D306</fpage>
<lpage>D312</lpage>
<pub-id pub-id-type="pmid">22096229</pub-id>
</element-citation>
</ref>
<ref id="pct157-B17">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jaillon</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Aury</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Noel</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Policriti</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Clepet</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Casagrande</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla</article-title>
<source>Nature</source>
<year>2007</year>
<volume>449</volume>
<fpage>463</fpage>
<lpage>467</lpage>
<pub-id pub-id-type="pmid">17721507</pub-id>
</element-citation>
</ref>
<ref id="pct157-B18">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jung</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Staton</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Blenda</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Svancara</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Abbott</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data</article-title>
<source>Nucleic Acids Res.</source>
<year>2008</year>
<volume>36</volume>
<fpage>D1034</fpage>
<lpage>D1040</lpage>
<pub-id pub-id-type="pmid">17932055</pub-id>
</element-citation>
</ref>
<ref id="pct157-B19">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kent</surname>
<given-names>WJ</given-names>
</name>
</person-group>
<article-title>BLAT—the BLAST-like alignment tool</article-title>
<source>Genome Res.</source>
<year>2002</year>
<volume>12</volume>
<fpage>656</fpage>
<lpage>664</lpage>
<pub-id pub-id-type="pmid">11932250</pub-id>
</element-citation>
</ref>
<ref id="pct157-B20">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Korf</surname>
<given-names>I</given-names>
</name>
</person-group>
<article-title>Gene finding in novel genomes</article-title>
<source>BMC Bioinformatics</source>
<year>2004</year>
<volume>5</volume>
<fpage>59</fpage>
<pub-id pub-id-type="pmid">15144565</pub-id>
</element-citation>
</ref>
<ref id="pct157-B21">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lamesch</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Berardini</surname>
<given-names>TZ</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Swarbreck</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wilks</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Sasidharan</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools</article-title>
<source>Nucleic Acids Res.</source>
<year>2012</year>
<volume>40</volume>
<fpage>D1202</fpage>
<lpage>D1210</lpage>
<pub-id pub-id-type="pmid">22140109</pub-id>
</element-citation>
</ref>
<ref id="pct157-B22">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Leroy</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Guilhot</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Sakai</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Bernard</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Choulet</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Theil</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes</article-title>
<source>Front. Plant Sci.</source>
<year>2012</year>
<volume>3</volume>
<fpage>5</fpage>
<pub-id pub-id-type="pmid">22645565</pub-id>
</element-citation>
</ref>
<ref id="pct157-B23">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lukashin</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Borodovsky</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>GeneMark.hmm: new solutions for gene finding</article-title>
<source>Nucleic Acids Res.</source>
<year>1998</year>
<volume>26</volume>
<fpage>1107</fpage>
<lpage>1115</lpage>
<pub-id pub-id-type="pmid">9461475</pub-id>
</element-citation>
</ref>
<ref id="pct157-B24">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Magrane</surname>
<given-names>M</given-names>
</name>
<collab>UniProt Consortium</collab>
</person-group>
<article-title>UniProt Knowledgebase: a hub of integrated protein data</article-title>
<source>Database (Oxford)</source>
<year>2011</year>
<volume>2011</volume>
<fpage>bar009</fpage>
<pub-id pub-id-type="pmid">21447597</pub-id>
</element-citation>
</ref>
<ref id="pct157-B25">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Majoros</surname>
<given-names>WH</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>An empirical analysis of training protocols for probabilistic gene finders</article-title>
<source>BMC Bioinformatics</source>
<year>2004</year>
<volume>5</volume>
<fpage>206</fpage>
<pub-id pub-id-type="pmid">15613242</pub-id>
</element-citation>
</ref>
<ref id="pct157-B26">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mihara</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Itoh</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Izawa</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>SALAD database: a motif-based database of protein annotations for plant comparative genomics</article-title>
<source>Nucleic Acids Res.</source>
<year>2010</year>
<volume>38</volume>
<fpage>D835</fpage>
<lpage>D842</lpage>
<pub-id pub-id-type="pmid">19854933</pub-id>
</element-citation>
</ref>
<ref id="pct157-B27">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nagamura</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Antonio</surname>
<given-names>BA</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Miyao</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Namiki</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Yonemaru</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Rice TOGO Browser: a platform to retrieve integrated information on rice functional and applied genomics</article-title>
<source>Plant Cell Physiol.</source>
<year>2011</year>
<volume>52</volume>
<fpage>230</fpage>
<lpage>237</lpage>
<pub-id pub-id-type="pmid">21216747</pub-id>
</element-citation>
</ref>
<ref id="pct157-B28">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nakamura</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Cochrane</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Karsch-Mizrachi</surname>
<given-names>I</given-names>
</name>
<collab>The International Nucleotide Sequence Database Collaboration</collab>
</person-group>
<article-title>The International Nucleotide Sequence Database Collaboration</article-title>
<source>Nucleic Acids Res.</source>
<year>2013</year>
<volume>41</volume>
<fpage>D21</fpage>
<lpage>D24</lpage>
<pub-id pub-id-type="pmid">23180798</pub-id>
</element-citation>
</ref>
<ref id="pct157-B29">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nussbaumer</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Martis</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Roessner</surname>
<given-names>SK</given-names>
</name>
<name>
<surname>Pfeifer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bader</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>MIPS PlantsDB: a database framework for comparative plant genome research</article-title>
<source>Nucleic Acids Res.</source>
<year>2013</year>
<volume>41</volume>
<fpage>D1144</fpage>
<lpage>D1151</lpage>
<pub-id pub-id-type="pmid">23203886</pub-id>
</element-citation>
</ref>
<ref id="pct157-B30">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pavy</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Rombauts</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Déhais</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Mathé</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ramana</surname>
<given-names>DV</given-names>
</name>
<name>
<surname>Leroy</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thaliana sequences</article-title>
<source>Bioinformatics</source>
<year>1999</year>
<volume>15</volume>
<fpage>887</fpage>
<lpage>899</lpage>
<pub-id pub-id-type="pmid">10743555</pub-id>
</element-citation>
</ref>
<ref id="pct157-B31">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Podicheti</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gollapudi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Dong</surname>
<given-names>Q</given-names>
</name>
</person-group>
<article-title>WebGBrowse—a web server for GBrowse</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>25</volume>
<fpage>1550</fpage>
<lpage>1551</lpage>
<pub-id pub-id-type="pmid">19357095</pub-id>
</element-citation>
</ref>
<ref id="pct157-B32">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Quevillon</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Silventoinen</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Pillai</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Harte</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Mulder</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Apweiler</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>InterProScan: protein domains identifier</article-title>
<source>Nucleic Acids Res.</source>
<year>2005</year>
<volume>33</volume>
<fpage>W116</fpage>
<lpage>W120</lpage>
<pub-id pub-id-type="pmid">15980438</pub-id>
</element-citation>
</ref>
<ref id="pct157-B33">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rogic</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mackworth</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Ouellette</surname>
<given-names>FB</given-names>
</name>
</person-group>
<article-title>Evaluation of gene-finding programs on mammalian sequences</article-title>
<source>Genome Res.</source>
<year>2001</year>
<volume>11</volume>
<fpage>817</fpage>
<lpage>832</lpage>
<pub-id pub-id-type="pmid">11337477</pub-id>
</element-citation>
</ref>
<ref id="pct157-B34">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rouard</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Guignon</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Aluome</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Laporte</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Droc</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Walde</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<article-title>GreenPhylDB v2.0: comparative and functional genomics in plants</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>D1095</fpage>
<lpage>D1102</lpage>
<pub-id pub-id-type="pmid">20864446</pub-id>
</element-citation>
</ref>
<ref id="pct157-B35">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sakai</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Tanaka</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Numa</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kawahara</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics</article-title>
<source>Plant Cell Physiol.</source>
<year>2013</year>
<volume>54</volume>
<fpage>e6</fpage>
<pub-id pub-id-type="pmid">23299411</pub-id>
</element-citation>
</ref>
<ref id="pct157-B36">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sakata</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Nagamura</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Numa</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Antonio</surname>
<given-names>BA</given-names>
</name>
<name>
<surname>Nagasaki</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Idonuma</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>RiceGAAS: an automated annotation system and database for rice genome sequence</article-title>
<source>Nucleic Acids Res.</source>
<year>2002</year>
<volume>30</volume>
<fpage>98</fpage>
<lpage>102</lpage>
<pub-id pub-id-type="pmid">11752265</pub-id>
</element-citation>
</ref>
<ref id="pct157-B37">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Salamov</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Solovyev</surname>
<given-names>VV</given-names>
</name>
</person-group>
<article-title>Ab initio gene finding in
<italic>Drosophila</italic>
genomic DNA</article-title>
<source>Genome Res.</source>
<year>2000</year>
<volume>10</volume>
<fpage>516</fpage>
<lpage>522</lpage>
<pub-id pub-id-type="pmid">10779491</pub-id>
</element-citation>
</ref>
<ref id="pct157-B38">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sayers</surname>
<given-names>EW</given-names>
</name>
<name>
<surname>Barrett</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Benson</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Bolton</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Bryant</surname>
<given-names>SH</given-names>
</name>
<name>
<surname>Canese</surname>
<given-names>K</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Database resources of the National Center for Biotechnology Information</article-title>
<source>Nucleic Acids Res.</source>
<year>2012</year>
<volume>40</volume>
<fpage>D13</fpage>
<lpage>D25</lpage>
<pub-id pub-id-type="pmid">22140104</pub-id>
</element-citation>
</ref>
<ref id="pct157-B39">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stanke</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Waack</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Gene prediction with a hidden Markov model and a new intron submodel</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<issue>Suppl. 2</issue>
<fpage>ii215</fpage>
<lpage>ii225</lpage>
<pub-id pub-id-type="pmid">14534192</pub-id>
</element-citation>
</ref>
<ref id="pct157-B40">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stein</surname>
<given-names>LD</given-names>
</name>
<name>
<surname>Mungall</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Shu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Caudy</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Mangone</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Day</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The generic genome browser: a building block for a model organism system database</article-title>
<source>Genome Res.</source>
<year>2002</year>
<volume>12</volume>
<fpage>1599</fpage>
<lpage>1610</lpage>
<pub-id pub-id-type="pmid">12368253</pub-id>
</element-citation>
</ref>
<ref id="pct157-B41">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Walenz</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Florea</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Sim4db and Leaff: utilities for fast batch spliced alignment and sequence indexing</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>1869</fpage>
<lpage>1870</lpage>
<pub-id pub-id-type="pmid">21551146</pub-id>
</element-citation>
</ref>
<ref id="pct157-B42">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Du</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Fine mapping and candidate gene analysis of ptgms2-1, the photoperiod-thermo-sensitive genic male sterile gene in rice (Oryza sativa L.)</article-title>
<source>Theor. Appl. Genet.</source>
<year>2011</year>
<volume>122</volume>
<fpage>365</fpage>
<lpage>372</lpage>
<pub-id pub-id-type="pmid">20938764</pub-id>
</element-citation>
</ref>
<ref id="pct157-B43">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yandell</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ence</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>A beginner’s guide to eukaryotic genome annotation</article-title>
<source>Nat. Rev. Genet.</source>
<year>2012</year>
<volume>13</volume>
<fpage>329</fpage>
<lpage>342</lpage>
<pub-id pub-id-type="pmid">22510764</pub-id>
</element-citation>
</ref>
<ref id="pct157-B44">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yao</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Borsuk</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Wen</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Skibbe</surname>
<given-names>DS</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Evaluation of five ab initio gene prediction programs for the discovery of maize genes</article-title>
<source>Plant Mol. Biol.</source>
<year>2005</year>
<volume>57</volume>
<fpage>445</fpage>
<lpage>460</lpage>
<pub-id pub-id-type="pmid">15830133</pub-id>
</element-citation>
</ref>
<ref id="pct157-B45">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Youens-Clark</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Buckler</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Casstevens</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Declerck</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Derwent</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gramene database in 2010: updates and extensions</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>D1085</fpage>
<lpage>D1094</lpage>
<pub-id pub-id-type="pmid">21076153</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000647 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000647 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3894707
   |texte=   MEGANTE: A Web-Based System for Integrated Plant Genome Annotation
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:24253915" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024