Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

DroSpeGe: rapid access database for new Drosophila species genomes

Identifieur interne : 000571 ( Pmc/Corpus ); précédent : 000570; suivant : 000572

DroSpeGe: rapid access database for new Drosophila species genomes

Auteurs : Donald G. Gilbert

Source :

RBID : PMC:1899099

Abstract

The Drosophila species comparative genome database DroSpeGe () provides genome researchers with rapid, usable access to 12 new and old Drosophila genomes, since its inception in 2004. Scientists can use, with minimal computing expertise, the wealth of new genome information for developing new insights into insect evolution. New genome assemblies provided by several sequencing centers have been annotated with known model organism gene homologies and gene predictions to provided basic comparative data. TeraGrid supplies the shared cyberinfrastructure for the primary computations. This genome database includes homologies to Drosophila melanogaster and eight other eukaryote model genomes, and gene predictions from several groups. BLAST searches of the newest assemblies are integrated with genome maps. GBrowse maps provide detailed views of cross-species aligned genomes. BioMart provides for data mining of annotations and sequences. Common chromosome maps identify major synteny among species. Potential gain and loss of genes is suggested by Gene Ontology groupings for genes of the new species. Summaries of essential genome statistics include sizes, genes found and predicted, homology among genomes, phylogenetic trees of species and comparisons of several gene predictions for sensitivity and specificity in finding new and known genes.


Url:
DOI: 10.1093/nar/gkl997
PubMed: 17202166
PubMed Central: 1899099

Links to Exploration step

PMC:1899099

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">DroSpeGe: rapid access database for new
<italic>Drosophila</italic>
species genomes</title>
<author>
<name sortKey="Gilbert, Donald G" sort="Gilbert, Donald G" uniqKey="Gilbert D" first="Donald G." last="Gilbert">Donald G. Gilbert</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">17202166</idno>
<idno type="pmc">1899099</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1899099</idno>
<idno type="RBID">PMC:1899099</idno>
<idno type="doi">10.1093/nar/gkl997</idno>
<date when="2007">2007</date>
<idno type="wicri:Area/Pmc/Corpus">000571</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">DroSpeGe: rapid access database for new
<italic>Drosophila</italic>
species genomes</title>
<author>
<name sortKey="Gilbert, Donald G" sort="Gilbert, Donald G" uniqKey="Gilbert D" first="Donald G." last="Gilbert">Donald G. Gilbert</name>
</author>
</analytic>
<series>
<title level="j">Nucleic Acids Research</title>
<idno type="ISSN">0305-1048</idno>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The
<italic>Drosophila</italic>
species comparative genome database DroSpeGe (
<ext-link ext-link-type="uri" xlink:href="http://insects.eugenes.org/DroSpeGe/"></ext-link>
) provides genome researchers with rapid, usable access to 12 new and old
<italic>Drosophila</italic>
genomes, since its inception in 2004. Scientists can use, with minimal computing expertise, the wealth of new genome information for developing new insights into insect evolution. New genome assemblies provided by several sequencing centers have been annotated with known model organism gene homologies and gene predictions to provided basic comparative data. TeraGrid supplies the shared cyberinfrastructure for the primary computations. This genome database includes homologies to
<italic>Drosophila melanogaster</italic>
and eight other eukaryote model genomes, and gene predictions from several groups. BLAST searches of the newest assemblies are integrated with genome maps. GBrowse maps provide detailed views of cross-species aligned genomes. BioMart provides for data mining of annotations and sequences. Common chromosome maps identify major synteny among species. Potential gain and loss of genes is suggested by Gene Ontology groupings for genes of the new species. Summaries of essential genome statistics include sizes, genes found and predicted, homology among genomes, phylogenetic trees of species and comparisons of several gene predictions for sensitivity and specificity in finding new and known genes.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Benson, D" uniqKey="Benson D">D. Benson</name>
</author>
<author>
<name sortKey="Wheeler, D" uniqKey="Wheeler D">D. Wheeler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Siepel, A G" uniqKey="Siepel A">A.G. Siepel</name>
</author>
<author>
<name sortKey="Bejerano, J S" uniqKey="Bejerano J">J.S. Bejerano</name>
</author>
<author>
<name sortKey="Pedersen, A S" uniqKey="Pedersen A">A.S. Pedersen</name>
</author>
<author>
<name sortKey="Hinrichs, M" uniqKey="Hinrichs M">M. Hinrichs</name>
</author>
<author>
<name sortKey="Hou, K" uniqKey="Hou K">K. Hou</name>
</author>
<author>
<name sortKey="Rosenbloom, H" uniqKey="Rosenbloom H">H. Rosenbloom</name>
</author>
<author>
<name sortKey="Clawson, J" uniqKey="Clawson J">J. Clawson</name>
</author>
<author>
<name sortKey="Spieth, L W" uniqKey="Spieth L">L.W. Spieth</name>
</author>
<author>
<name sortKey="Hillier, S" uniqKey="Hillier S">S. Hillier</name>
</author>
<author>
<name sortKey="Richards, G M" uniqKey="Richards G">G.M. Richards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stein, L D" uniqKey="Stein L">L.D. Stein</name>
</author>
<author>
<name sortKey="Mungall, C" uniqKey="Mungall C">C. Mungall</name>
</author>
<author>
<name sortKey="Shu, S" uniqKey="Shu S">S. Shu</name>
</author>
<author>
<name sortKey="Caudy, M" uniqKey="Caudy M">M. Caudy</name>
</author>
<author>
<name sortKey="Mangone, M" uniqKey="Mangone M">M. Mangone</name>
</author>
<author>
<name sortKey="Day, A" uniqKey="Day A">A. Day</name>
</author>
<author>
<name sortKey="Nickerson, E" uniqKey="Nickerson E">E. Nickerson</name>
</author>
<author>
<name sortKey="Stajich, J E" uniqKey="Stajich J">J.E. Stajich</name>
</author>
<author>
<name sortKey="Harris, T W" uniqKey="Harris T">T.W. Harris</name>
</author>
<author>
<name sortKey="Arva, A" uniqKey="Arva A">A. Arva</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kasprzyk, A" uniqKey="Kasprzyk A">A. Kasprzyk</name>
</author>
<author>
<name sortKey="Keefe, D" uniqKey="Keefe D">D. Keefe</name>
</author>
<author>
<name sortKey="Smedley, D" uniqKey="Smedley D">D. Smedley</name>
</author>
<author>
<name sortKey="London, D" uniqKey="London D">D. London</name>
</author>
<author>
<name sortKey="Spooner, W" uniqKey="Spooner W">W. Spooner</name>
</author>
<author>
<name sortKey="Melsopp, C" uniqKey="Melsopp C">C. Melsopp</name>
</author>
<author>
<name sortKey="Hammond, M" uniqKey="Hammond M">M. Hammond</name>
</author>
<author>
<name sortKey="Rocca Serra, P" uniqKey="Rocca Serra P">P. Rocca-Serra</name>
</author>
<author>
<name sortKey="Cox, T" uniqKey="Cox T">T. Cox</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E. Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, S F" uniqKey="Altschul S">S.F. Altschul</name>
</author>
<author>
<name sortKey="Madden, T L" uniqKey="Madden T">T.L. Madden</name>
</author>
<author>
<name sortKey="Schaffer, A A" uniqKey="Schaffer A">A.A. Schaffer</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J. Zhang</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z. Zhang</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W. Miller</name>
</author>
<author>
<name sortKey="Lipman, D J" uniqKey="Lipman D">D.J. Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gross, S S" uniqKey="Gross S">S.S. Gross</name>
</author>
<author>
<name sortKey="Do, C B" uniqKey="Do C">C.B. Do</name>
</author>
<author>
<name sortKey="Batzoglou, S" uniqKey="Batzoglou S">S. Batzoglou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, J Q" uniqKey="Wu J">J.Q. Wu</name>
</author>
<author>
<name sortKey="Shteynberg, D" uniqKey="Shteynberg D">D. Shteynberg</name>
</author>
<author>
<name sortKey="Arumugam, M" uniqKey="Arumugam M">M. Arumugam</name>
</author>
<author>
<name sortKey="Gibbs, R A" uniqKey="Gibbs R">R.A. Gibbs</name>
</author>
<author>
<name sortKey="Brent, M R" uniqKey="Brent M">M.R. Brent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Korf, I" uniqKey="Korf I">I. Korf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E. Birney</name>
</author>
<author>
<name sortKey="Clamp, M" uniqKey="Clamp M">M. Clamp</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R. Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chatterji, S" uniqKey="Chatterji S">S. Chatterji</name>
</author>
<author>
<name sortKey="Pachter, L" uniqKey="Pachter L">L. Pachter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Slater, G S" uniqKey="Slater G">G.S. Slater</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E. Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parra, G" uniqKey="Parra G">G. Parra</name>
</author>
<author>
<name sortKey="Blanco, E" uniqKey="Blanco E">E. Blanco</name>
</author>
<author>
<name sortKey="Guig, R" uniqKey="Guig R">R. Guigó</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Souvorov, A" uniqKey="Souvorov A">A. Souvorov</name>
</author>
<author>
<name sortKey="Hlavina, W" uniqKey="Hlavina W">W. Hlavina</name>
</author>
<author>
<name sortKey="Kapustin, Y" uniqKey="Kapustin Y">Y. Kapustin</name>
</author>
<author>
<name sortKey="Kiryutin, B" uniqKey="Kiryutin B">B. Kiryutin</name>
</author>
<author>
<name sortKey="Kitts, P" uniqKey="Kitts P">P. Kitts</name>
</author>
<author>
<name sortKey="Pruitt, K" uniqKey="Pruitt K">K. Pruitt</name>
</author>
<author>
<name sortKey="Sapojnikov, V" uniqKey="Sapojnikov V">V. Sapojnikov</name>
</author>
<author>
<name sortKey="Ostell, J" uniqKey="Ostell J">J. Ostell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sturgill, D" uniqKey="Sturgill D">D. Sturgill</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y. Zhang</name>
</author>
<author>
<name sortKey="Parisi, M" uniqKey="Parisi M">M. Parisi</name>
</author>
<author>
<name sortKey="Oliver, B" uniqKey="Oliver B">B. Oliver</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heger, A" uniqKey="Heger A">A. Heger</name>
</author>
<author>
<name sortKey="Ponting, C" uniqKey="Ponting C">C. Ponting</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stein, L" uniqKey="Stein L">L. Stein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stajich, J E" uniqKey="Stajich J">J.E. Stajich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Colbourne, J K" uniqKey="Colbourne J">J.K. Colbourne</name>
</author>
<author>
<name sortKey="Singan, V R" uniqKey="Singan V">V.R. Singan</name>
</author>
<author>
<name sortKey="Gilbert, D G" uniqKey="Gilbert D">D.G. Gilbert</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="iso-abbrev">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="pmc">nar</journal-id>
<journal-id journal-id-type="publisher-id">Nucleic Acids Research</journal-id>
<journal-title-group>
<journal-title>Nucleic Acids Research</journal-title>
</journal-title-group>
<issn pub-type="ppub">0305-1048</issn>
<issn pub-type="epub">1362-4962</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">17202166</article-id>
<article-id pub-id-type="pmc">1899099</article-id>
<article-id pub-id-type="doi">10.1093/nar/gkl997</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>DroSpeGe: rapid access database for new
<italic>Drosophila</italic>
species genomes</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Gilbert</surname>
<given-names>Donald G.</given-names>
</name>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
<aff>
<institution>Department of Biology, Indiana University</institution>
<addr-line>Bloomington, IN 47405, USA</addr-line>
</aff>
</contrib-group>
<author-notes>
<corresp id="cor1">
<sup>*</sup>
Tel: +1 812 333 5616; Fax: +1 812 855 6705; Email:
<email>gilbertd@indiana.edu</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>1</month>
<year>2007</year>
</pub-date>
<volume>35</volume>
<issue>Database issue</issue>
<fpage>D480</fpage>
<lpage>D485</lpage>
<history>
<date date-type="received">
<day>15</day>
<month>9</month>
<year>2006</year>
</date>
<date date-type="rev-recd">
<day>17</day>
<month>10</month>
<year>2006</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>10</month>
<year>2006</year>
</date>
</history>
<permissions>
<copyright-statement>© 2006 The Author(s)</copyright-statement>
<copyright-year>2006</copyright-year>
<license license-type="openaccess">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/2.0/uk/">http://creativecommons.org/licenses/by-nc/2.0/uk/</ext-link>
) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>The
<italic>Drosophila</italic>
species comparative genome database DroSpeGe (
<ext-link ext-link-type="uri" xlink:href="http://insects.eugenes.org/DroSpeGe/"></ext-link>
) provides genome researchers with rapid, usable access to 12 new and old
<italic>Drosophila</italic>
genomes, since its inception in 2004. Scientists can use, with minimal computing expertise, the wealth of new genome information for developing new insights into insect evolution. New genome assemblies provided by several sequencing centers have been annotated with known model organism gene homologies and gene predictions to provided basic comparative data. TeraGrid supplies the shared cyberinfrastructure for the primary computations. This genome database includes homologies to
<italic>Drosophila melanogaster</italic>
and eight other eukaryote model genomes, and gene predictions from several groups. BLAST searches of the newest assemblies are integrated with genome maps. GBrowse maps provide detailed views of cross-species aligned genomes. BioMart provides for data mining of annotations and sequences. Common chromosome maps identify major synteny among species. Potential gain and loss of genes is suggested by Gene Ontology groupings for genes of the new species. Summaries of essential genome statistics include sizes, genes found and predicted, homology among genomes, phylogenetic trees of species and comparisons of several gene predictions for sensitivity and specificity in finding new and known genes.</p>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>INTRODUCTION</title>
<p>Many new genomes are becoming available this decade. Current contents of public genome archives exceed 1 billion sequence traces from >1000 organisms [
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/Web/Newsltr/V15N1/trace.html"></ext-link>
; (
<xref ref-type="bibr" rid="b1">1</xref>
)]. This number will increase rapidly as costs drop and scientific uses for comparing many genomes increases (
<xref ref-type="bibr" rid="b2">2</xref>
). Biologists should have rapid access to these new genomes, including basic annotations from well-studied model organisms and predictions to locate potential new genes, to make sense of them. Genome annotation and database management can be streamlined now using generic tools, shared computing resources and common genome database techniques to provide useful access to biologists in weeks instead of several months.</p>
<p>New genome sequencing projects and communities are facing large informatics tasks for incorporating, curating and annotating and disseminating sequence and annotation data. Effective genome studies need an informatics infrastructure that moves beyond individual organism projects to a cost-effective use of common tools. Expertise from existing genome projects should be leveraged into building such tools. The Generic Model Organism Database [GMOD (
<xref ref-type="bibr" rid="b3">3</xref>
)] project has this goal, to fully develop and extend a genome database tool set to the level of quality needed to create and maintain new genome databases. GMOD and related genome database tools now support a portion of the basic tasks for such. Two needs in development for GMOD are the creation of new databases for emerging model organisms, and tools for comparative genome databases that integrate data from many sources.</p>
<p>A common, ongoing task for research that uses genome databases is to compare an organism's genome and proteome with related organisms, and other sequence datasets (ESTs, SNPs, transposable elements). This task requires significant computational infrastructure, one where reusable tools, protocols and resources will be valuable and significantly reduce duplicative infrastructure and maintenance effort. Software tools to fully assembly, analyze and compare these genomes are available to bioscientists. The ability to employ these tools on genome datasets is limited to those with extensive computational resources and engineering talent. Effective use of shared cyberinfrastructure in bioinformatics is a problem today. Cluster and Grid computing in bioinformatics have followed other disciplines in parallelizing applications, but this is costly and limited to a subset of bioinformatics applications. This database enables bioscientists to have usable access to new genomes shortly after sequencing centers make them available, facilitating new science discoveries and understanding of the evolution, comparative biology and genomics of these model organisms.</p>
</sec>
<sec>
<title>GENOME INFORMATICS METHODS</title>
<sec>
<title>Common components</title>
<p>DroSpeGe has been built with common GMOD database components and open source software shared with other genome databases. Use of common components facilitates rapid construction and interoperability. The GMOD ARGOS replicable genome database template (
<ext-link ext-link-type="uri" xlink:href="www.gmod.org/argos/"></ext-link>
) provides a tested set of integrated components. The genome access tools of GMOD GBrowse [
<ext-link ext-link-type="uri" xlink:href="www.gmod.org/gbrowse/"></ext-link>
;(
<xref ref-type="bibr" rid="b3">3</xref>
)], BioMart [
<ext-link ext-link-type="uri" xlink:href="www.biomart.org"></ext-link>
; (
<xref ref-type="bibr" rid="b4">4</xref>
)] and BLAST (
<xref ref-type="bibr" rid="b5">5</xref>
) are available for the
<italic>Drosophila</italic>
species genomes. The GMOD Chado relational database schema (
<ext-link ext-link-type="uri" xlink:href="www.gmod.org/chado/"></ext-link>
) is used for managing an extensible range of genome information. Middleware in Perl and Java are added to bring together BLAST, BioMart, sequence reports, searches and other bioinformatics programs for public access. Another aid to integrating and mining these data is GMOD Lucegene (
<ext-link ext-link-type="uri" xlink:href="www.gmod.org/lucegene/"></ext-link>
), that forms a core component for rapid data retrieval by attributes, GBrowse data retrieval and databank partitioning for Grid analyses. DroSpeGe operates on several Unix computers; the primary server is a SunFire V20z from Sun Microsystems. Genome maps include
<italic>Drosophila melanogaster</italic>
DNA and protein homology, homologies to nine eukaryote proteomes, marker gene locations, gene predictions using 15 methods produced by several contributing groups. The assemblies and predicted genes can be BLASTed, with links to genome maps. BioMart provides searches of the full genome annotation sets, allowing selections of genome regions with and without specific features.</p>
</sec>
<sec>
<title>New species genomes</title>
<p>Twelve
<italic>Drosophila</italic>
genomes, 10 recently sequenced, contain over 2 billion nt, with sizes ranging from a small 133 Mb of
<italic>D.melanogaster</italic>
to >230 Mb in
<italic>Drosophila willistoni</italic>
(
<xref ref-type="table" rid="tbl1">Table 1</xref>
). The model organism
<italic>D.melanogaster</italic>
is approaching its fifth major assembly release, and continues to see significant improvements in genes and genome features. It has a known, located complement of ∼14 000 protein genes. One main impetus for undertaking the sequencing of 11 additional related species is to improve via comparative analyses the knowledge of this major research organism.
<italic>Drosophila pseudoobscura</italic>
, the second related genome, is in its second major release. The additional species are at their first major assembly stage, requiring automated annotation, quality assessment and cross-species comparisons. Four of these new genomes (Dsim, Dsec, Dyak and Dere) are close relatives of the model in the melanogaster subgroup. The remainder range through five other taxonomic groups with an estimated divergence time of 40 million years, with the cactus breeder
<italic>Drosophila mojavensis</italic>
, widely distributed
<italic>Drosophila virilis</italic>
and Hawaiian picture-wing
<italic>Drosophila grimshawi</italic>
most distant from Dmel. Assembly sequences of the
<italic>Drosophila</italic>
species comparative annotation freeze 1 (CAF1) are distributed at
<ext-link ext-link-type="uri" xlink:href="http://rana.lbl.gov/drosophila/caf1.html"></ext-link>
, as listed in
<xref ref-type="table" rid="tbl1">Table 1</xref>
. These form the primary source data for this database. Over the course of two years, this database has provided rapid access to several assembly releases per species, including annotation, searching and viewing services for each release.</p>
<table-wrap id="tbl1" position="float">
<label>Table 1</label>
<caption>
<p>
<italic>Drosophila</italic>
species genomes, abbreviation, sequencing centers and genome size of CAF1 assemblies used at DroSpeGe</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Abbreviation</th>
<th align="left" rowspan="1" colspan="1">Species</th>
<th align="left" rowspan="1" colspan="1">Size (Mb)</th>
<th align="left" rowspan="1" colspan="1">Sequencing center</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Dmel</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila melanogaster</italic>
</td>
<td align="left" rowspan="1" colspan="1">133</td>
<td align="left" rowspan="1" colspan="1">Berkeley Drosophila Genome Project/Celera</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dsim</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila simulans</italic>
</td>
<td align="left" rowspan="1" colspan="1">142</td>
<td align="left" rowspan="1" colspan="1">Genome Sequencing Center, Washington University</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dsec</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila sechellia</italic>
</td>
<td align="left" rowspan="1" colspan="1">167</td>
<td align="left" rowspan="1" colspan="1">Broad Institute</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dyak</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila yakuba</italic>
</td>
<td align="left" rowspan="1" colspan="1">160</td>
<td align="left" rowspan="1" colspan="1">Genome Sequencing Center, Washington University</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dere</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila erecta</italic>
</td>
<td align="left" rowspan="1" colspan="1">153</td>
<td align="left" rowspan="1" colspan="1">Agencourt Bioscience Corporation</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dana</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila ananassae</italic>
</td>
<td align="left" rowspan="1" colspan="1">231</td>
<td align="left" rowspan="1" colspan="1">Agencourt Bioscience Corporation</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dper</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila persimilis</italic>
</td>
<td align="left" rowspan="1" colspan="1">188</td>
<td align="left" rowspan="1" colspan="1">Broad Institute</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dpse</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila pseudoobscura</italic>
</td>
<td align="left" rowspan="1" colspan="1">153</td>
<td align="left" rowspan="1" colspan="1">Human Genome Sequencing Center, Baylor College of Medicine</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dwil</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila willistoni</italic>
</td>
<td align="left" rowspan="1" colspan="1">237</td>
<td align="left" rowspan="1" colspan="1">J. Craig Venter Institute</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dmoj</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila mojavensis</italic>
</td>
<td align="left" rowspan="1" colspan="1">194</td>
<td align="left" rowspan="1" colspan="1">Agencourt Bioscience Corporation</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dvir</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila virilis</italic>
</td>
<td align="left" rowspan="1" colspan="1">206</td>
<td align="left" rowspan="1" colspan="1">Agencourt Bioscience Corporation</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dgri</td>
<td align="left" rowspan="1" colspan="1">
<italic>Drosophila grimshawi</italic>
</td>
<td align="left" rowspan="1" colspan="1">200</td>
<td align="left" rowspan="1" colspan="1">Agencourt Bioscience Corporation</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Annotations produced by several groups collaboratively are provided for map viewing and data mining. Protein coding gene predictions viewable at this resource include contributions listed in
<xref ref-type="table" rid="tbl2">Table 2</xref>
.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="tbl2" position="float">
<label>Table 2</label>
<caption>
<p>
<italic>Drosophila</italic>
species genome annotations (partial list) included at DroSpeGe, contributed at
<ext-link ext-link-type="uri" xlink:href="http://rana.lbl.gov/drosophila/wiki/index.php/Annotation_Submission"></ext-link>
</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Contributor</th>
<th align="left" rowspan="1" colspan="1">Annotation description</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">S. Batzoglou Lab, Stanford</td>
<td align="left" rowspan="1" colspan="1">Contrast [
<ext-link ext-link-type="uri" xlink:href="http://contra.stanford.edu/contrast/"></ext-link>
; (
<xref ref-type="bibr" rid="b6">6</xref>
)] predictions</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">M. Brent Lab, Washington University, St Louis</td>
<td align="left" rowspan="1" colspan="1">N-SCAN [
<ext-link ext-link-type="uri" xlink:href="http://mblab.wustl.edu/"></ext-link>
;
<xref ref-type="bibr" rid="b7">7</xref>
)] predictions with melanogaster alignments</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">D. Gilbert Lab, Indiana University</td>
<td align="left" rowspan="1" colspan="1">SNAP [
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2105/5/59"></ext-link>
; (
<xref ref-type="bibr" rid="b8">8</xref>
)] predictions, model organism gene homologies</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">M. Eisen Lab, UC Berkeley/LBNL</td>
<td align="left" rowspan="1" colspan="1">GeneWise (
<xref ref-type="bibr" rid="b9">9</xref>
), GeneMapper [
<ext-link ext-link-type="uri" xlink:href="http://bio.math.berkeley.edu/genemapper/"></ext-link>
; (
<xref ref-type="bibr" rid="b10">10</xref>
)], Exonerate (
<xref ref-type="bibr" rid="b11">11</xref>
) annotations</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">R. Guigó Genome Bioinformatics Lab, Barcelona</td>
<td align="left" rowspan="1" colspan="1">Geneid [
<ext-link ext-link-type="uri" xlink:href="http://genome.imim.es/software/geneid/index.html"></ext-link>
; (
<xref ref-type="bibr" rid="b12">12</xref>
)] predictions</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">NCBI, Bethesda</td>
<td align="left" rowspan="1" colspan="1">Gnomon [
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.ncbi.nih.gov/genomes/Drosophila_melanogaster/special_requests/CAF1/"></ext-link>
; (
<xref ref-type="bibr" rid="b13">13</xref>
)] predictions</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">B. Oliver Lab, LCDB, NIDDK, NIH</td>
<td align="left" rowspan="1" colspan="1">Gene expression evidence from microarray [
<ext-link ext-link-type="uri" xlink:href="http://intramural.niddk.nih.gov/research/nimble/nimblefly.htm"></ext-link>
; (
<xref ref-type="bibr" rid="b14">14</xref>
)]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">L. Pachter Lab, UC Berkeley</td>
<td align="left" rowspan="1" colspan="1">GeneMapper (
<xref ref-type="bibr" rid="b10">10</xref>
) annotations</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">C. Ponting Lab, MRC FGU Oxford</td>
<td align="left" rowspan="1" colspan="1">Gene prediction pipeline [
<ext-link ext-link-type="uri" xlink:href="http://wwwfgu.anat.ox.ac.uk:8080/flies/documentation.html"></ext-link>
; (
<xref ref-type="bibr" rid="b15">15</xref>
)] with Exonerate</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>TeraGrid genome analyses</title>
<p>The TeraGrid project (
<ext-link ext-link-type="uri" xlink:href="www.teragrid.org"></ext-link>
) is part of a shared cyberinfrastructure for sciences, funded primarily by NSF. TeraGrid provides collaborative, cost-effective scientific computing infrastructure much in the same way the GMOD initiative is building common tools for genome databases. The TeraGrid system is particularly suitable for genome assembly, annotation, gene finding and phylogenetic analyses. TeraGrid computers have been employed to analyze the 12
<italic>Drosophila</italic>
genomes, providing the major contents of DroSpeGe database. This has enabled rapid analyses without the expense of obtaining and maintaining a local computer cluster. This experience forms a basis for other genome projects to use TeraGrid. Scripts used for this analysis are available at the GMOD repository (
<ext-link ext-link-type="uri" xlink:href="http://gmod.cvs.sourceforge.net/gmod/genogrid/"></ext-link>
). Genome database tools from GMOD project are used to organize the computations for public access. Results include
<italic>D.melanogaster</italic>
genome homology, homologies to nine eukaryote proteomes, gene predictions, marker gene locations and
<italic>Drosophila</italic>
microsatellites. For each of 12
<italic>Drosophila</italic>
genomes, a comparison is made to a set of nine proteomes, with 217 000 proteins, drawn from source genome databases, Ensembl and NCBI. The reference proteomes are human, mouse, zebrafish, fruitfly (Dmel), mosquito, bee, worm (
<italic>Caenorhabditis elegans</italic>
), mustard weed (
<italic>Arabidopsis thaliana</italic>
) and yeast. Sizes of the new genomes are in the 150–250 Mb range. Protein–genome DNA alignment is done using tBLASTn, with a Grid-aware version of NCBI software. The TeraGrid run for each genome took 12–18 h using 64 processors. Whole genome DNA–DNA alignments were performed for a subset of new genomes. Gene predictions with SNAP (
<xref ref-type="bibr" rid="b8">8</xref>
) have been generated. Over the course of 6 months, with 2–3 genome assembly updates each per species, and error corrections, the total TeraGrid 64 cpu usage per genome has been ∼4 days, excluding queue-waiting times.</p>
</sec>
</sec>
<sec>
<title>DATABASE USES</title>
<p>DroSpeGe provides a resource to biologists interested in comparing species differences and similarities, including novel and known genes, genome structure and evolution, gene function associations. Known genes from model organisms are found in the new genomes at expected rates, allowing for variations due to assembly quality. The most divergent species (Dmoj, Dvir, Dgri), with 40 million years divergence from Dmel, match ∼90% of the model species genes. These known genes provide useful access to the new genomes for many researchers interested in locating a particular gene or gene family. The known gene matches also offer searches and cataloging gene contents by known functions.
<xref ref-type="fig" rid="fig1">Figure 1</xref>
shows the size and similarity to the Dmel model of these genomes. Genome annotations and analyses produced for this database is available at DroSpeGe/data/and in bulk form at
<ext-link ext-link-type="ftp" xlink:href="ftp://eugenes.org/eugenes/genomes/"></ext-link>
, including annotations of CAF1 and prior assembly releases. Related genome projects also provide
<italic>Drosophila</italic>
genome data and complementary services (see Related Work).</p>
<fig id="fig1" position="float">
<label>Figure 1</label>
<caption>
<p>
<italic>Drosophila</italic>
species assemblies, showing assembly sizes and coverage of these by
<italic>D.melanogaster</italic>
genome DNA (top and middle lines, in megabases, left ordinate), and counts of chromosome segments inverted relative to Dmel (bottom line, right ordinate). Species on abscissa are taxonomically ordered with Dgri most distant from Dmel. This is summarized from
<ext-link ext-link-type="uri" xlink:href="DroSpeGe/news/genome-summaries/dnacoverage.html"></ext-link>
.</p>
</caption>
<graphic xlink:href="gkl997f1"></graphic>
</fig>
<sec>
<title>Genome data mining</title>
<p>An emerging trend among bioscientists and bioinformaticians is to use data mining of large subsets of genome data, often focused on summary information for a range of common attributes. These data are used in spreadsheets and simple databases or analyses. Genomics web databases often lack methods for effectively mining large subsets of genomes, or are limited in the questions one can pose to the underlying complex data (
<xref ref-type="bibr" rid="b16">16</xref>
). The Ensembl project with its off-shoot BioMart (
<xref ref-type="bibr" rid="b4">4</xref>
) is an example of integrated software and data that bridge the gap in biology data access between bulk files and web portals. A tool for creating BioMart-compliant transaction databases,
<italic>gff2biomart</italic>
, is a recent addition by the author to GMOD tools collection (
<ext-link ext-link-type="uri" xlink:href="http://gmod.cvs.sourceforge.net/gmod/schema/GMODTools/bin/"></ext-link>
). It has been used for DroSpeGe and other genome datasets. BioMart with annotations of 12
<italic>Drosophila</italic>
genomes has provided numerous bioscientists with a unique data mining access to these new genomes.</p>
<p>With BioMart, one can select genome regions with the available annotations, and exclude others and download tables or sequences of the selection set. For instance, select the regions with mosquito gene homologs, but lacking
<italic>D.melanogaster</italic>
homologs. Or select regions with gene predictions but no known homology. A major reason to undertake the genome sequencing of 12
<italic>Drosophila</italic>
species is to improve genome knowledge of the widely used
<italic>D.melanogaster</italic>
model organism. A significant application for BioMart has been to identify gene predictions in
<italic>D.melanogaster</italic>
that do not match known genes. Further phylogenetic analysis of these new gene predictions has identified a subset with cross-species homology and high synonymous substitution rates, validating these as likely new genes and coding exons with phylogenetic evidence. Another application of BioMart has been to compare the qualities of gene predictor methods, identifying predicted exons that coincide with known gene homology, and with gene expression datasets to measure sensitivity at predicting new and known genes.</p>
</sec>
<sec>
<title>Genome maps</title>
<p>Maps of the 12 genomes form the core, with BLAST searches, of discovery tools for bioscientists. Maps including all available annotations from several groups are provided using GBrowse (
<xref ref-type="bibr" rid="b3">3</xref>
). The BLAST result reports include hyperlinks from each alignment match to the respective genome map, as well as to sequence and GFF annotation results. As species comparisons are of much interest, BLAST results also link to a comparative map display of the matches. A recent addition to the genome maps is an aligned comparative map set for any group of the 12 species. As seen in
<xref ref-type="fig" rid="fig2">Figure 2</xref>
, this allows one to view phylogenetic evidence of common gene predictions and features in homologous regions. In this example, genes that are predicted in the model Dmel, but previously not located, are found to be orthologously located across eight species (Dmel through Dmoj). This capability of full comparative annotation maps may be unique to DroSpeGe. Other genome maps offer either a single species map with tracks that summarize homology, or a syntenic view of two species. An overview of all species chromosome maps is provided in the DroSpeGe/maps/section. These overviews link to detailed genome maps, with known gene homology locations, gene expression evidence and predictions.</p>
<fig id="fig2" position="float">
<label>Figure 2</label>
<caption>
<p>Aligned genomes view of new
<italic>D.melanogaster</italic>
gene locations on X chromosome, on
<italic>D.melanogaster, Drosophila simulans</italic>
and
<italic>Drosophila yakuba</italic>
, identified with cross-species comparison of coding exons, from DroSpeGe/data/dmel-dspp/newgenes. Several gene predictors match these common coding exons. Additional evidence from EST, protein HSP matches and gene expression data corroborate the new genes. Genomes with orthologous gene predictions not shown, but viewable at DroSpeGe maps, include Dsec, Dere, Dana, Dpse and Dmoj.</p>
</caption>
<graphic xlink:href="gkl997f2"></graphic>
</fig>
</sec>
<sec>
<title>Common chromosomes in
<italic>Drosophila</italic>
</title>
<p>A series of maps show large-scale synteny between genome assembly units (scaffolds or chromosomes), as determined from genome × genome DNA BLAST matches, identified as common Muller elements. These are found in DroSpeGe/maps/muller-elements. Muller's elements are the names A, B, C, D, E, F for six chromosome arms common among
<italic>Drosophila</italic>
as coined by Hermann Müller. Chromosome names and centromeric joins differ among the species; Muller elements identify the common units. These synteny maps provide scientists with quick access to common genome regions among the 12 species. The melanogaster group species (Dmel, Dsim, Dsec, Dyak and Dere) have close matching, with large-scale inversions evident in these maps. Among the more distantly related species, the new genome assembly of
<italic>D.mojavensis</italic>
has proved most complete, with four Muller elements nearly fully assembled, the autosomes B to E, and the sex chromosome assembled into four major scaffolds.</p>
</sec>
<sec>
<title>Gene variation by gene ontology group</title>
<p>To provide an assessment of possible gene gain and loss among
<italic>Drosophila</italic>
, gene matches to Gene Ontology categories by species were tabulated, and provided at section DroSpeGe/news/genome-summaries/gene-GO-function-association. These may indicate species differences in functional categories. Statistically significant deviations are indicated. While low counts, suggestive missing genes, may be due to divergence of genes, extra gene matches more strongly suggest categories where species differ. Among the interesting differences, transport genes (GO:0006810) may show a phylogenetic cline with more in the non-melanogaster group (Dana to Dgri); protein binding genes (GO:00055515) may be more common in the Dmel–Dsim–Dsec siblings; protein biosynthesis (GO:0006412) is higher in the Dpse–Dper sibling species. Individual species peaks such as Dwil for catalytic activity genes (GO:0003824) or signal transduction (GO:0007165) in Dgri, suggest species-specific adaptations. The gene matches are high-scoring segment pair (HSP) groupings, and include various events: gene duplications, alternate splice exons within genes, new genes that appear composed of exons from other genes, as well as computational artifacts. Detailed evidence pages provide links to GBrowse genome map views showing all secondary HSPs. Proteome sources in this analysis are those organism with extensive GO annotations: Dmel fruitfly, mouse,
<italic>C.elegans</italic>
worm and yeast. GO-Slim groupings are used for Biological Process, Molecular Function, Cell Location (125 categories). A table provides the correspondence between MOD gene ID, GO primary ID and GO-slim groupings. Chris Mungall's GO map2slim software is employed for this, along with current GO gene associations.</p>
</sec>
</sec>
<sec>
<title>RELATED WORK</title>
<p>
<italic>Drosophila</italic>
species assemblies, analyses and annotations have been coordinated at Michael Eisen's community Wiki (
<ext-link ext-link-type="uri" xlink:href="rana.lbl.gov/drosophila/wiki/"></ext-link>
), in an open way that serves as a model for future genome collaborations. Contributors have here submitted data, genome analyses, summaries and discussion for the benefit of the research community. In conjunction with this, the Eisen Lab provides annotations, analyses and GBrowse maps of
<italic>Drosophila</italic>
species. The FlyBase project (
<ext-link ext-link-type="uri" xlink:href="www.flybase.org"></ext-link>
) has benefited from these community efforts, recently adding a subset of annotations for 12 species to its map and search services. The most comparable effort to DroSpeGe in approach, if not species, is the Fungal Comparative Genomics resource [
<ext-link ext-link-type="uri" xlink:href="fungal.genome.duke.edu"></ext-link>
;
<ext-link ext-link-type="uri" xlink:href="http://www.duke.edu/~jes12/thesis/"></ext-link>
; (
<xref ref-type="bibr" rid="b17">17</xref>
)] that catalogs 56 genomes including the model
<italic>Saccharomyces cerevisiae</italic>
and related yeasts and fungi. Fungal Genomics offers GBrowse maps, BLAST, gene predictions and phylogenetic comparisons. Comprehensive genome resources with
<italic>Drosophila</italic>
include UCSC Genome Bioinformatics (
<ext-link ext-link-type="uri" xlink:href="genome.ucsc.edu"></ext-link>
), Ensembl (
<ext-link ext-link-type="uri" xlink:href="www.ensembl.org"></ext-link>
), Entrez Genomes (
<ext-link ext-link-type="uri" xlink:href="www.ncbi.nih.gov"></ext-link>
). The latter two currently show only
<italic>D.melanogaster</italic>
. UCSC Genomes has most of the insect genomes, but as of October 2006 has yet to update to current
<italic>Drosophila</italic>
assemblies and annotations. UCSC provides a very useful set of comparative homology analyses for each species genome.</p>
<p>The DroSpeGe comparative genome database has provided bioscientists with rapid access to new genomes in a usable way, with new annotations, browsing, search and summary services not available elsewhere. Future plans for this database focus on enhancing genome comparison functions, with improvements to category overviews for gene functions, pathways and orthology evidence. Additional insect genomes and the arthropod
<italic>Daphnia pulex</italic>
[
<ext-link ext-link-type="uri" xlink:href="wfleabase.org"></ext-link>
; (
<xref ref-type="bibr" rid="b18">18</xref>
)] may be integrated to extend the comparative range.</p>
</sec>
</body>
<back>
<ack>
<p>Reviewers who suggested improvements to this paper are thanked for their efforts. This project is supported in part by grants from the National Human Genome Research Institute of the National Institutes of Health, and from Sun Microsystems, to D.G. National Science Foundation TeraGrid access grant provided computing resources. The Open Access publication charges for this article were waived by Oxford University Press.</p>
<p>
<italic>Conflict of interest statement</italic>
. None declared.</p>
</ack>
<ref-list>
<title>REFERENCES</title>
<ref id="b1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="editor">
<name>
<surname>Benson</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wheeler</surname>
<given-names>D.</given-names>
</name>
</person-group>
<article-title>Trace Archives at 1 Billion</article-title>
<source>NCBI News</source>
<year>2006</year>
<comment>15(1), NIH Publication No. 06-3272</comment>
</element-citation>
</ref>
<ref id="b2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Siepel</surname>
<given-names>A.G.</given-names>
</name>
<name>
<surname>Bejerano</surname>
<given-names>J.S.</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>A.S.</given-names>
</name>
<name>
<surname>Hinrichs</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hou</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Rosenbloom</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Clawson</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Spieth</surname>
<given-names>L.W.</given-names>
</name>
<name>
<surname>Hillier</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Richards</surname>
<given-names>G.M.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Evolutionarily conserved elements in vertebrate, insect, worm and yeast genomes</article-title>
<source>Genome Res.</source>
<year>2005</year>
<volume>15</volume>
<fpage>1034</fpage>
<lpage>1050</lpage>
<pub-id pub-id-type="pmid">16024819</pub-id>
</element-citation>
</ref>
<ref id="b3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stein</surname>
<given-names>L.D.</given-names>
</name>
<name>
<surname>Mungall</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Shu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Caudy</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Mangone</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Day</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Nickerson</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Stajich</surname>
<given-names>J.E.</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>T.W.</given-names>
</name>
<name>
<surname>Arva</surname>
<given-names>A.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The generic genome browser: a building block for a model organism system database</article-title>
<source>Genome Res.</source>
<year>2002</year>
<volume>12</volume>
<fpage>1599</fpage>
<lpage>610</lpage>
<pub-id pub-id-type="pmid">12368253</pub-id>
</element-citation>
</ref>
<ref id="b4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kasprzyk</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Keefe</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Smedley</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>London</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Spooner</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Melsopp</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Hammond</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Rocca-Serra</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Cox</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E.</given-names>
</name>
</person-group>
<article-title>EnsMart: a generic system for fast and flexible access to biological data</article-title>
<source>Genome Res.</source>
<year>2004</year>
<volume>14</volume>
<fpage>160</fpage>
<lpage>169</lpage>
<pub-id pub-id-type="pmid">14707178</pub-id>
</element-citation>
</ref>
<ref id="b5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>S.F.</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>T.L.</given-names>
</name>
<name>
<surname>Schaffer</surname>
<given-names>A.A.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>D.J.</given-names>
</name>
</person-group>
<article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</article-title>
<source>Nucleic Acids Res.</source>
<year>1997</year>
<volume>25</volume>
<fpage>3389</fpage>
<lpage>3402</lpage>
<pub-id pub-id-type="pmid">9254694</pub-id>
</element-citation>
</ref>
<ref id="b6">
<label>6</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Gross</surname>
<given-names>S.S.</given-names>
</name>
<name>
<surname>Do</surname>
<given-names>C.B.</given-names>
</name>
<name>
<surname>Batzoglou</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>CONTRAST:
<italic>de novo</italic>
gene prediction using a semi-Markov conditional random field</article-title>
<year>2005</year>
<conf-name>Biomedical Computation at Stanford Symposium Proceedings (BCATS)</conf-name>
<conf-loc>Stanford, CA</conf-loc>
<fpage>82</fpage>
</element-citation>
</ref>
<ref id="b7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>J.Q.</given-names>
</name>
<name>
<surname>Shteynberg</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Arumugam</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gibbs</surname>
<given-names>R.A.</given-names>
</name>
<name>
<surname>Brent</surname>
<given-names>M.R.</given-names>
</name>
</person-group>
<article-title>Identification of rat genes by TWINSCAN gene prediction, RT–PCR, and direct sequencing</article-title>
<source>Genome Res.</source>
<year>2004</year>
<volume>14</volume>
<fpage>665</fpage>
<lpage>671</lpage>
<pub-id pub-id-type="pmid">15060008</pub-id>
</element-citation>
</ref>
<ref id="b8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Korf</surname>
<given-names>I.</given-names>
</name>
</person-group>
<article-title>Gene finding in novel genomes</article-title>
<source>BMC Bioinformatics</source>
<year>2004</year>
<volume>5</volume>
<fpage>59</fpage>
<pub-id pub-id-type="pmid">15144565</pub-id>
</element-citation>
</ref>
<ref id="b9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Birney</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Clamp</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>GeneWise and genomewise</article-title>
<source>Genome Res.</source>
<year>2004</year>
<volume>14</volume>
<fpage>988</fpage>
<lpage>995</lpage>
<pub-id pub-id-type="pmid">15123596</pub-id>
</element-citation>
</ref>
<ref id="b10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chatterji</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pachter</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>Reference based annotation with GeneMapper</article-title>
<source>Genome Biol.</source>
<year>2006</year>
<volume>7</volume>
<fpage>R29</fpage>
<pub-id pub-id-type="pmid">16600017</pub-id>
</element-citation>
</ref>
<ref id="b11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Slater</surname>
<given-names>G.S.</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E.</given-names>
</name>
</person-group>
<article-title>Automated generation of heuristics for biological sequence comparison</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>31</fpage>
<pub-id pub-id-type="pmid">15713233</pub-id>
</element-citation>
</ref>
<ref id="b12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parra</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Blanco</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Guigó</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Geneid in Drosophila</article-title>
<source>Genome Res.</source>
<year>2000</year>
<volume>10</volume>
<fpage>511</fpage>
<lpage>515</lpage>
<pub-id pub-id-type="pmid">10779490</pub-id>
</element-citation>
</ref>
<ref id="b13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Souvorov</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hlavina</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Kapustin</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Kiryutin</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Kitts</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Pruitt</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Sapojnikov</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Ostell</surname>
<given-names>J.</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Benson</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wheeler</surname>
<given-names>D.</given-names>
</name>
</person-group>
<article-title>Gnomon annotation of
<italic>Drosophila</italic>
species genomes</article-title>
<source>New Genome Builds and Annotations at NCBI. NCBI News Fall/Winter</source>
<year>2006</year>
<comment>NIH Publication No. 04-3272</comment>
</element-citation>
</ref>
<ref id="b14">
<label>14</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Sturgill</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Parisi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Oliver</surname>
<given-names>B.</given-names>
</name>
</person-group>
<article-title>
<italic>Drosophila</italic>
species expression arrays, preliminary results</article-title>
<year>2006</year>
<publisher-loc>NIDDK, NIH</publisher-loc>
<publisher-name>Laboratory of Cellular and Developmental Biology</publisher-name>
</element-citation>
</ref>
<ref id="b15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Heger</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ponting</surname>
<given-names>C.</given-names>
</name>
</person-group>
<article-title>
<italic>Drosophila</italic>
gene prediction pipeline with Exonerate. (manuscript in preparation)</article-title>
<year>2006</year>
</element-citation>
</ref>
<ref id="b16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stein</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>Integrating Biological Databases</article-title>
<source>Nature Rev. Genet.</source>
<year>2003</year>
<volume>4</volume>
<fpage>337</fpage>
<lpage>345</lpage>
<pub-id pub-id-type="pmid">12728276</pub-id>
</element-citation>
</ref>
<ref id="b17">
<label>17</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Stajich</surname>
<given-names>J.E.</given-names>
</name>
</person-group>
<article-title>A comparative genomic investigation of fungal genome evolution</article-title>
<year>2006</year>
<comment>PhD Dissertation, Graduate School of Duke University</comment>
</element-citation>
</ref>
<ref id="b18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Colbourne</surname>
<given-names>J.K.</given-names>
</name>
<name>
<surname>Singan</surname>
<given-names>V.R.</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>D.G.</given-names>
</name>
</person-group>
<article-title>wFleaBase: the Daphnia genome database</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>45</fpage>
<pub-id pub-id-type="pmid">15752432</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000571 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000571 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:1899099
   |texte=   DroSpeGe: rapid access database for new Drosophila species genomes
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:17202166" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024