Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000559 ( Pmc/Corpus ); précédent : 0005589; suivant : 0005600 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Intron gain and loss in segmentally duplicated genes in rice</title>
<author>
<name sortKey="Lin, Haining" sort="Lin, Haining" uniqKey="Lin H" first="Haining" last="Lin">Haining Lin</name>
<affiliation>
<nlm:aff id="I1">The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhu, Wei" sort="Zhu, Wei" uniqKey="Zhu W" first="Wei" last="Zhu">Wei Zhu</name>
<affiliation>
<nlm:aff id="I1">The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Silva, Joana C" sort="Silva, Joana C" uniqKey="Silva J" first="Joana C" last="Silva">Joana C. Silva</name>
<affiliation>
<nlm:aff id="I1">The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gu, Xun" sort="Gu, Xun" uniqKey="Gu X" first="Xun" last="Gu">Xun Gu</name>
<affiliation>
<nlm:aff id="I2">Department of Genetics, Development, and Cell Biology, Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Buell, C Robin" sort="Buell, C Robin" uniqKey="Buell C" first="C Robin" last="Buell">C Robin Buell</name>
<affiliation>
<nlm:aff id="I1">The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">16719932</idno>
<idno type="pmc">1779517</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1779517</idno>
<idno type="RBID">PMC:1779517</idno>
<idno type="doi">10.1186/gb-2006-7-5-r41</idno>
<date when="2006">2006</date>
<idno type="wicri:Area/Pmc/Corpus">000559</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000559</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Intron gain and loss in segmentally duplicated genes in rice</title>
<author>
<name sortKey="Lin, Haining" sort="Lin, Haining" uniqKey="Lin H" first="Haining" last="Lin">Haining Lin</name>
<affiliation>
<nlm:aff id="I1">The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhu, Wei" sort="Zhu, Wei" uniqKey="Zhu W" first="Wei" last="Zhu">Wei Zhu</name>
<affiliation>
<nlm:aff id="I1">The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Silva, Joana C" sort="Silva, Joana C" uniqKey="Silva J" first="Joana C" last="Silva">Joana C. Silva</name>
<affiliation>
<nlm:aff id="I1">The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gu, Xun" sort="Gu, Xun" uniqKey="Gu X" first="Xun" last="Gu">Xun Gu</name>
<affiliation>
<nlm:aff id="I2">Department of Genetics, Development, and Cell Biology, Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Buell, C Robin" sort="Buell, C Robin" uniqKey="Buell C" first="C Robin" last="Buell">C Robin Buell</name>
<affiliation>
<nlm:aff id="I1">The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Genome Biology</title>
<idno type="ISSN">1465-6906</idno>
<idno type="eISSN">1465-6914</idno>
<imprint>
<date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Analysis of over 3,000 co-linear paired genes in rice shows more intron loss than intron gain following segmental duplication.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Genome Biol</journal-id>
<journal-title>Genome Biology</journal-title>
<issn pub-type="ppub">1465-6906</issn>
<issn pub-type="epub">1465-6914</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">16719932</article-id>
<article-id pub-id-type="pmc">1779517</article-id>
<article-id pub-id-type="publisher-id">gb-2006-7-5-r41</article-id>
<article-id pub-id-type="doi">10.1186/gb-2006-7-5-r41</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Intron gain and loss in segmentally duplicated genes in rice</article-title>
</title-group>
<contrib-group>
<contrib id="A1" contrib-type="author">
<name>
<surname>Lin</surname>
<given-names>Haining</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>hlin@tigr.org</email>
</contrib>
<contrib id="A2" contrib-type="author">
<name>
<surname>Zhu</surname>
<given-names>Wei</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>wzhu@tigr.org</email>
</contrib>
<contrib id="A3" contrib-type="author">
<name>
<surname>Silva</surname>
<given-names>Joana C</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>jsilva@tigr.org</email>
</contrib>
<contrib id="A4" contrib-type="author">
<name>
<surname>Gu</surname>
<given-names>Xun</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>xgu@iastate.edu</email>
</contrib>
<contrib id="A5" corresp="yes" contrib-type="author">
<name>
<surname>Buell</surname>
<given-names>C Robin</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>rbuell@tigr.org</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
The Institute for Genomic Research, Medical Center Drive, Rockville, MD 20850, USA</aff>
<aff id="I2">
<label>2</label>
Department of Genetics, Development, and Cell Biology, Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA</aff>
<pub-date pub-type="ppub">
<year>2006</year>
</pub-date>
<pub-date pub-type="epub">
<day>23</day>
<month>5</month>
<year>2006</year>
</pub-date>
<volume>7</volume>
<issue>5</issue>
<fpage>R41</fpage>
<lpage>R41</lpage>
<ext-link ext-link-type="uri" xlink:href="http://genomebiology.com/2006/7/5/R41"></ext-link>
<history>
<date date-type="received">
<day>30</day>
<month>1</month>
<year>2006</year>
</date>
<date date-type="rev-recd">
<day>21</day>
<month>3</month>
<year>2006</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>4</month>
<year>2006</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2006 Lin et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2006</copyright-year>
<copyright-holder>Lin et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<p>This is an open access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0"></ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
<pmc-comment> Lin Haining hlin@tigr.org Intron gain and loss in segmentally duplicated genes in rice 2006Genome Biology 7(5): R41-. (2006)1465-6906(2006)7:5urn:ISSN:1465-6906</pmc-comment>
</license>
</permissions>
<abstract abstract-type="short">
<p>Analysis of over 3,000 co-linear paired genes in rice shows more intron loss than intron gain following segmental duplication.</p>
</abstract>
<abstract>
<sec>
<title>Background</title>
<p>Introns are under less selection pressure than exons, and consequently, intronic sequences have a higher rate of gain and loss than exons. In a number of plant species, a large portion of the genome has been segmentally duplicated, giving rise to a large set of duplicated genes. The recent completion of the rice genome in which segmental duplication has been documented has allowed us to investigate intron evolution within rice, a diploid monocotyledonous species.</p>
</sec>
<sec>
<title>Results</title>
<p>Analysis of segmental duplication in rice revealed that 159 Mb of the 371 Mb genome and 21,570 of the 43,719 non-transposable element-related genes were contained within a duplicated region. In these duplicated regions, 3,101 collinear paired genes were present. Using this set of segmentally duplicated genes, we investigated intron evolution from full-length cDNA-supported non-transposable element-related gene models of rice. Using gene pairs that have an ortholog in the dicotyledonous model species
<italic>Arabidopsis thaliana</italic>
, we identified more intron loss (49 introns within 35 gene pairs) than intron gain (5 introns within 5 gene pairs) following segmental duplication. We were unable to demonstrate preferential intron loss at the 3' end of genes as previously reported in mammalian genomes. However, we did find that the four nucleotides of exons that flank lost introns had less frequently used 4-mers.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We observed that intron evolution within rice following segmental duplication is largely dominated by intron loss. In two of the five cases of intron gain within segmentally duplicated genes, the gained sequences were similar to transposable elements.</p>
</sec>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>Introns are under less selection pressure than exons, and consequently, their sequences diverge faster than exons. However, the position of the intron with respect to the protein sequence is relatively conserved and conservation of intron position has been observed between distinct eukaryotic lineages throughout about 1.5 billion years of evolution such as between animal and fungal genes [
<xref ref-type="bibr" rid="B1">1</xref>
] and between the malaria parasite
<italic>Plasmodium falciparum </italic>
and other eukaryotes [
<xref ref-type="bibr" rid="B2">2</xref>
]. With respect to intron position within genes, introns within intron-sparse species as well as single intron genes are preferentially located near the 5' end of the gene [
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B4">4</xref>
], suggesting a biased pattern of intron distribution. Indeed, recent studies on 684 eukaryotic orthologous genes from eight eukaryotic species of animals, plants, fungi, and protists showed preferential intron loss [
<xref ref-type="bibr" rid="B5">5</xref>
,
<xref ref-type="bibr" rid="B6">6</xref>
] and intron gain [
<xref ref-type="bibr" rid="B6">6</xref>
] in the 3' end of genes. This is in contrast to an analysis in fungal species in which no positional bias in intron loss was observed [
<xref ref-type="bibr" rid="B7">7</xref>
].</p>
<p>Introns can be classified into three categories based on location relative to the codon. Introns that do not interrupt the codons are termed phase 0, while phase 1 introns are located between the first and second bases of the codon and phase 2 introns are located between the second and third bases of the codon. It has been reported that eukaryotic genes have more phase 0 introns than phase 1 or phase 2 introns; on average a 5:3:2 ratio of phase 0: phase 1: phase 2 introns is observed, although the specific ratio of intron phase appears to be species specific [
<xref ref-type="bibr" rid="B8">8</xref>
-
<xref ref-type="bibr" rid="B10">10</xref>
]. Several explanations have been proposed for phase bias, including legacy of gene formation in the intron early theory [
<xref ref-type="bibr" rid="B11">11</xref>
,
<xref ref-type="bibr" rid="B12">12</xref>
], phase bias of intron insertion [
<xref ref-type="bibr" rid="B13">13</xref>
], and phase bias of intron loss or selection [
<xref ref-type="bibr" rid="B5">5</xref>
,
<xref ref-type="bibr" rid="B7">7</xref>
].</p>
<p>Discovery of both intron loss and intron gain suggests that these two processes may be ongoing events in evolution. The rates of intron gain and loss seem to differ greatly among species [
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B7">7</xref>
,
<xref ref-type="bibr" rid="B14">14</xref>
-
<xref ref-type="bibr" rid="B16">16</xref>
] and the underlying mechanism(s) driving intron loss and gain are still unknown. With respect to plants, large-scale computational analyses of intron loss and gain have been focused on
<italic>Arabidopsis thaliana</italic>
, a model dicotyledonous plant [
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B4">4</xref>
-
<xref ref-type="bibr" rid="B6">6</xref>
,
<xref ref-type="bibr" rid="B16">16</xref>
-
<xref ref-type="bibr" rid="B20">20</xref>
]. With the availability of the near-complete, high quality rice genome sequence [
<xref ref-type="bibr" rid="B21">21</xref>
] and uniform, high quality gene annotation for the genome [
<xref ref-type="bibr" rid="B22">22</xref>
], we have the ability to examine intron loss and gain within a second plant species that represents the other major clade of angiosperms, monocotyledonous plants. Phylogenetic analysis indicates that date of divergence of
<italic>Arabidopsis </italic>
and rice is approximately 130 to 200 million years ago (MYA) [
<xref ref-type="bibr" rid="B23">23</xref>
-
<xref ref-type="bibr" rid="B25">25</xref>
]. Interestingly, depending on the completeness and quality of the genome dataset, as well as the methods and parameters employed, the rice genome underwent a segmental duplication that involved 15% to 62% of the genome [
<xref ref-type="bibr" rid="B25">25</xref>
-
<xref ref-type="bibr" rid="B29">29</xref>
] and occurred approximately 70 MYA [
<xref ref-type="bibr" rid="B25">25</xref>
,
<xref ref-type="bibr" rid="B27">27</xref>
], with the exception of the top arms of chromosomes 11 and 12, which underwent a more recent duplication estimated at 5 MYA [
<xref ref-type="bibr" rid="B27">27</xref>
].</p>
<p>Segmental duplication in rice provides the opportunity to study intron gain and loss within a subset of genes that have recently diverged. In this study, we report on the evolution of introns within coding sequences (CDS) after segmental duplication in rice. Through our examination of segmentally duplicated genes, we anticipated that we would identify more intron gain or loss events than for non-duplicated genes due to the accelerated rate of intron loss or intron gain in duplicated versus orthologous genes, as reported previously in two malaria parasites [
<xref ref-type="bibr" rid="B30">30</xref>
]. Other advantages of investigating segmentally duplicated genes are that the age of the duplication is approximately 70 MYA [
<xref ref-type="bibr" rid="B25">25</xref>
,
<xref ref-type="bibr" rid="B27">27</xref>
], which is within the approximately 100 million years divergence limit for investigating recently gained introns [
<xref ref-type="bibr" rid="B31">31</xref>
,
<xref ref-type="bibr" rid="B32">32</xref>
], and that segmentally duplicated blocks are more reliable than individually duplicated genes for this type of analysis. Furthermore, we could exploit the phylogeny of rice with
<italic>A. thaliana</italic>
, a model dicotyledous plant with a near-complete genome sequence, as the outgroup to readily classify 'intron loss' and 'intron gain' events between the two duplicated rice genes.</p>
</sec>
<sec>
<title>Results</title>
<sec>
<title>Rice segmentally duplicated blocks</title>
<p>Previous analyses of segmental duplication in rice used sequence datasets that contained a substantial portion of unfinished genome sequence and lacked refined structural and functional annotation of the genes [
<xref ref-type="bibr" rid="B25">25</xref>
-
<xref ref-type="bibr" rid="B29">29</xref>
]. Thus, we repeated the analysis of segmental duplication using a set of pseudomolecules (about 371 Mb total) that contain 98% finished sequence and had been annotated for genes both at the structural and functional level [
<xref ref-type="bibr" rid="B22">22</xref>
]. Depending on the maximum distance permitted between collinear gene pairs, 25.9% to 53.4% of the rice genome could be identified as segmentally duplicated (Table
<xref ref-type="table" rid="T1">1</xref>
). Using a maximum distance of 200 kb between collinear gene pairs, a total of 149 segmentally duplicated blocks were identified (Additional data file 1). The largest block had 287 pairs of duplicated genes between chromosomes 11 and 12, consistent with the more recent duplicated reported between the top arms of these two chromosomes [
<xref ref-type="bibr" rid="B27">27</xref>
]. These 149 blocks covered 159 Mb (42.8%) of the 371 Mb genome and contained 21,570 of the total 43,719 non-transposable element (TE) related genes (49.3%) in the rice genome. Of these 21,570 genes, 5,567 were retained within the blocks and corresponded to 3,101 pairs of segmentally duplicated genes distributed across all 12 chromosomes of rice (Additional data file 2), with chromosomes 1 and 5 having the largest number of duplicated gene pairs (656 pairs).</p>
<p>An increase in genome coverage within the duplicated regions was observed if the maximum distance permitted between collinear gene pairs was expanded from 200 kb to 500 kb, 1 Mb, or 5 Mb, whereas a much smaller percentage of the genome was covered if the maximum distance was limited to 100 kb (Table
<xref ref-type="table" rid="T1">1</xref>
). Previous studies on segmental duplication in the rice genomes reported that 15% to 62% of the rice genome had undergone segmental duplication [
<xref ref-type="bibr" rid="B25">25</xref>
-
<xref ref-type="bibr" rid="B29">29</xref>
], consistent with our analyses of duplication within the rice genome. As we wished to examine intron evolution within segmentally duplicated genes and there was little difference in percent of the genome identified as duplicated using a maximum distance of 500 kb, 1 Mb, and 5 Mb between collinear gene pairs, we utilized the intermediate estimate of segmental duplication that we obtained using 200 kb as the maximum distance permitted between collinear gene pairs. Thus, our subsequent analyses report on duplicated genes with a maximum distance of 200 kb permitted between collinear gene pairs.</p>
</sec>
<sec>
<title>Conservation of exon-intron structure</title>
<p>Within the 43,719 non-transposable element-related gene models in rice, 140,827 introns within the CDS are present, with an average length of 385 base pairs (bp; standard deviation (std) 470) and an average GC content of 37.5%. Out of the 3,101 pairs of segmentally duplicated genes, 281 pairs had at least one intron that passed the manual review for full-length (fl)-cDNA support and single isoform. In total, 2,573 introns were present within these 281 gene pairs and had a similar length distribution (average 315 bp) and GC content (36.9% GC) to those found throughout the genome. We found that 197 of the 281 pairs (70%) had completely conserved exon-intron structure in the coding region (958 intron positions in the alignments), that is, the intron number, position, and phase were identical among the duplicated genes (Figure
<xref ref-type="fig" rid="F1">1</xref>
). The other 84 pairs (30%) had incongruent exon-intron structure. To eliminate the possibility that the incongruence was due to an aberrant alignment, these alignments were manually checked. Only introns surrounded by reliable alignments and only pairs with a putative orthologous gene from
<italic>Arabidopsis </italic>
were further investigated. Thus, 48 alignments were excluded and a total of 36 pairs of genes (137 intron positions within the alignments) that showed potential intron loss or intron gain were investigated further.</p>
</sec>
<sec>
<title>Abundance of intron loss after segmental duplication</title>
<p>To determine whether the incongruence was due to intron gain or loss, we used the putative orthologous gene from
<italic>Arabidopsis </italic>
for the gene pair. From our set of 36 gene pairs with validated alignments, we identified 31 gene pairs with an intron loss(es) (43 intron losses in total), one gene pair with a single gained intron, and four gene pairs in which both intron loss and gain were observed (6 intron losses and 4 intron gains). An example of intron loss is shown in Figure
<xref ref-type="fig" rid="F2">2</xref>
. In this example, the third intron of LOC_Os07g49150.1 was lost as shown by the comparison to the duplicated rice gene model LOC_Os03g18690.1 and the putative ortholog from
<italic>Arabidopsis </italic>
At4g29040.1. Alignments of all of the 36 gene pairs with their orthologs from
<italic>Arabidopsis </italic>
are displayed in Additional data file 3. The length of the lost introns (226 bp, std 206) was shorter than the average intron length in the rice genome (385 bp, std 470). The distribution of the length of the lost introns and gained introns and the frequency of the length of the 33,011 fl-cDNA supported (FLS) rice introns (see Materials and methods for detail) are shown in Figure
<xref ref-type="fig" rid="F3">3</xref>
.</p>
</sec>
<sec>
<title>Intron loss showed no preference at the 3' end of genes</title>
<p>A single intron loss, termed an independent intron loss, was observed in 31 gene pairs as determined by alignment with the putative
<italic>Arabidopsis </italic>
ortholog. However, within these 31 gene pairs, 34 introns in total were lost as for 3 gene pairs, both rice genes underwent separate intron loss events. In these 31 gene pairs, we observed no bias in intron loss position at the 3' ends of genes (Figure
<xref ref-type="fig" rid="F4">4</xref>
). Neither was there a bias in the position of intron loss in our set of four gene pairs in which multiple intron losses were observed (data not shown). Interestingly, in one gene pair (LOC_Os05g02130.1 and LOC_Os01g74320.1), all seven introns were lost in LOC_Os01g74320.1, and in LOC_Os07g44140.1, multiple consecutive introns at the 3' end of the gene were lost (see Additional data file 3).</p>
</sec>
<sec>
<title>Intron loss rate at phase 0, 1, 2</title>
<p>Previous reports on intron loss suggested a phase bias [
<xref ref-type="bibr" rid="B5">5</xref>
]. To investigate phase bias in intron loss, we first examined intron phase distribution within the rice genome using a set of introns (33,011 total) derived from the coding regions of 6,046 rice gene models that were supported with fl-cDNA evidence, had no alternative splicing isoform, and had at least one intron within the CDS. The phases of the coding introns were distributed as phase 0 (57.3%): phase 1 (21.5%): phase 2 (21.2%), comparable to the distribution reported previously in plants (62: 17: 21) [
<xref ref-type="bibr" rid="B1">1</xref>
].</p>
<p>To examine whether there was a bias in the phase of intron loss in segmentally duplicated genes in rice, we examined the 34 independently lost introns and excluded genes with multiple intron losses. The frequency of intron loss at phase 2 was higher, but not statistically significant, than intron loss at phase 0 and 1 (Table
<xref ref-type="table" rid="T2">2</xref>
; χ
<sup>2 </sup>
test
<italic>P</italic>
value = 0.155). Randomization tests showed that intron loss at phase 2 was unexpectedly high (
<italic>P</italic>
value = 0.06) and intron loss at phase 0 was unexpectedly low (
<italic>P</italic>
value = 0.08).</p>
</sec>
<sec>
<title>Rare 4-mers in the exonic sequence at the donor splice site of lost introns</title>
<p>Previous studies indicated sequence composition preferences surrounding splice sites [
<xref ref-type="bibr" rid="B13">13</xref>
,
<xref ref-type="bibr" rid="B33">33</xref>
]. As our sample size was small, we restricted our analysis of nucleotide composition surrounding the splice site to the nearest four nucleotides (4-mers); a total of 31 gene pairs with an independent intron loss (34 total introns) were investigated to determine the exonic nucleotide composition flanking each pair of lost and retained introns (Figure
<xref ref-type="fig" rid="F5">5</xref>
). We observed that the 4-mer usage flanking all rice introns was dependent on intron phase (Additional data file 4 and 5). For example, ACAA occurs at the exon donor splice site 70, 17 and 110 times at phase 0, phase 1 and phase 2, respectively. To determine if intron loss is independent of the nucleotide composition of the exon sequence flanking introns, we compared the 4-mers flanking lost introns with those flanking the corresponding retained introns, as well as with the 4-mers flanking all rice introns. To this end, the exonic 4-mers flanking the donor and acceptor splice sites of the lost and retained introns were each attributed a rank, with rank of 1 being the rarest, according to their frequency in the sample of all rice introns (Tables
<xref ref-type="table" rid="T3">3</xref>
and
<xref ref-type="table" rid="T4">4</xref>
; see Materials and methods).</p>
<p>The sum of the ranks (SoR) of the exonic 4-mers flanking the donor splice site of the lost introns (observed SoR = 6,737) was very significantly lower than expected (expected SoR = 7,647;
<italic>P </italic>
approximately 0.0007), while that at the acceptor site of the lost introns was within the average range (Table
<xref ref-type="table" rid="T5">5</xref>
). These results reveal a preponderance of rare 4-mers flanking the 5' end of lost introns. This observation is further supported by the fact that the distribution of ranks of 4-mers flanking the donor splice site in lost introns is significantly lower than that in the corresponding retained introns (
<italic>P </italic>
< 0.013; Wilcoxon's signed rank test). The rank distributions of 4-mers flanking the acceptor splice site did not differ significantly between lost and retained introns (
<italic>P</italic>
approximately 0.069).</p>
</sec>
<sec>
<title>Source of gained introns</title>
<p>Two out of the five gained introns showed several matches to known rice transposon sequences. The intron of LOC_Os12g02840.1 had a significant hit to a putative Ty1-copia subclass retrotransposon protein (82% identity over the entire intron). A large portion of the other gained intron (LOC_Os12g37660.1; 430 bp out of 741 bp) was highly similar (92% identity) to
<italic>Oryza sativa </italic>
transposon Rim2-M341 (BK000935) [
<xref ref-type="bibr" rid="B34">34</xref>
]. To ascertain if any of the five gained introns had inserted into other regions of the genome, we searched the five gained introns against our set of 12 pseudomolecules. Three of the gained introns did not match any sequence in the rice genome except itself. For the gained intron in LOC_Os12g02840.1, three high quality matches were detected: to the entire intron of LOC_Os11g03070 (98% identity, putative function of sodium/hydrogen exchanger family protein), which is another segmentally duplicated gene of LOC_Os12g02840.1 from the 5 MYA duplication event; 82% identity to the entire intron of LOC_Os10g05450 (annotated as a hypothetical protein); and 82% identity to the entire intron of LOC_Os06g36500 (annotated as retrotransposon protein, putative, Ty1-copia e subclass). For the second gained intron (LOC_Os12g37660.1), a large portion (approximately 400 bp) matched to numerous regions throughout the pseudomolecules. Of the 64 top alignments to the gained intron within LOC_Os12g37660.1 (approximately 95% identity, approximately 400 bp in length), 54 were in intergenic regions and 10 were within introns of genes, all of which lacked fl-cDNA support (3 hypothetical proteins, 3 expressed proteins, 2 transposable-element related proteins, and 2 known proteins).</p>
<p>We examined these five cases of intron gain further by examining homologous genes from other plant species. With the exception of one case, the gained intron was clearly a straightforward insertion into one of the rice gene pairs (Additional data file 6). For LOC_Os3g16960.1, the gained intron was observed in the maize and sorghum homologs, but absent in the
<italic>Arabidopsis </italic>
and poplar homologs. Thus, the most parsimonious explanation for the data is a single insertion into one of the rice duplicates prior to the divergence of rice, sorghum, and maize (data not shown).</p>
</sec>
</sec>
<sec>
<title>Discussion</title>
<p>Intron loss and gain are two important processes in evolution. We observed more genes with intron loss than gain after segmental duplication in rice. We estimated the rates of intron loss and gain after the segmental genome duplication in rice. Allowing
<italic>p </italic>
to be the proportion of non-conserved introns between duplicated genes, we have
<italic>p </italic>
= 54/(137 + 958) = 0.0493, where 54 is the number of non-conserved introns, 137 is the total number of the aligned intron positions within the 36 gene pairs that have intron loss and gain, and 958 is the total number of aligned intron positions within the 197 conserved gene pairs. Given that intron loss and acquisition are rare events, the expected rate of intron loss and gain can be estimated under the simple Poisson model and calculated as:</p>
<p>D
<sub>int </sub>
= -ln (1 - p) = 0.0506</p>
<p>If we estimate t = 70 MYA for the rice genome duplication [
<xref ref-type="bibr" rid="B25">25</xref>
,
<xref ref-type="bibr" rid="B27">27</xref>
], we estimate that the rate of intron gain and loss is:</p>
<p>μ = D
<sub>int</sub>
/2t = 0.0506/(2 × 70 × 10
<sup>6</sup>
) = 3.61 × 10
<sup>-10 </sup>
per intron per year</p>
<p>As a total of 49 lost introns and 5 gained introns were observed, we estimated the evolutionary rate of intron loss and intron gain after the genome duplication is:</p>
<p>μ
<sub>loss </sub>
= 3.61 × 10
<sup>-10 </sup>
× 49/(49 + 5) = 3.28 × 10
<sup>-10 </sup>
per intron per year</p>
<p>μ
<sub>gain </sub>
= 3.61 × 10
<sup>-10 </sup>
× 5(49 + 5) = 3.34 × 10
<sup>-11 </sup>
per intron per year</p>
<p>A previous study involving 684 groups of orthologous genes reported an intron loss rate in
<italic>Arabidopsis </italic>
of 2 to 3 × 10
<sup>-10 </sup>
per year and an intron gain rate of 2.2 to 2.9 × 10
<sup>-12 </sup>
per year [
<xref ref-type="bibr" rid="B16">16</xref>
]. Our study, which involved segmentally duplicated genes within rice, revealed a similar intron loss rate but a higher intron gain rate, which may be reflective of the reduced evolutionary pressure on duplicated genes. The detection of transposon-related sequences in two of the five gained introns suggests that transposable elements may have a role in intron evolution and is consistent with the increased fraction of transposable elements in the rice genome compared to
<italic>Arabidopsis </italic>
[
<xref ref-type="bibr" rid="B21">21</xref>
].</p>
<p>It is possible that the rate of intron loss and gain differs within our set of segmentally duplicated genes as it has been previously reported that the segmental duplication between the top arms of chromosomes 11 and 12 is recent (within 5 MYA) in comparison to the bulk of the segmental duplication, estimated at 70 MYA [
<xref ref-type="bibr" rid="B25">25</xref>
,
<xref ref-type="bibr" rid="B27">27</xref>
]. Thus, we determined the
<italic>d</italic>
<sub>S </sub>
for the 233 gene pairs that had a single isoform, were supported by a fl-cDNA, and had been manually validated (197 gene pairs with congruent intron structure and 36 gene pairs with intron loss and/or intron gain). The
<italic>d</italic>
<sub>S </sub>
values ranged from 0.03 to 24.86 with a clear peak between 0.6 to 1.4 (data not shown). Similar rates of intron loss (1.41 × 10
<sup>-10 </sup>
per intron per year) and intron gain (0.94 × 10
<sup>-11 </sup>
per intron per year) were obtained from the calculations performed using a subset of the 233 gene pairs in which the
<italic>d</italic>
<sub>S </sub>
between duplicates was between 0.6 and 1.4 (117 pairs total with four gene pairs originating from the top arms of chromosomes 11 and 12).</p>
<p>A reverse transcriptase-mediated model in which a segment of the genomic copy of a gene can be replaced by a reverse-transcribed copy via homologous recombination was previously proposed to explain the pattern of intron loss [
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B35">35</xref>
,
<xref ref-type="bibr" rid="B36">36</xref>
] and has been further supported by recent genomic analysis of several species [
<xref ref-type="bibr" rid="B5">5</xref>
,
<xref ref-type="bibr" rid="B6">6</xref>
,
<xref ref-type="bibr" rid="B15">15</xref>
]. The 3' end bias of intron loss is important evidence for this model as reverse transcriptase is error-prone and, as a consequence, a high frequency of 5'-truncated cDNA fragments are generated. Although we did not observe a 3' end preference of intron loss, we did find examples of multiple consecutive intron loss at the 3' end of genes and even loss of all the introns, which is consistent with the reverse-transcriptase-mediated model. Lack of power due to a small sample size (34 lost introns) might be one explanation for the lack of evidence for a 3' bias of intron loss in rice. Another explanation may be the unusual intron distribution pattern, which is similar to that of
<italic>Arabidopsis </italic>
(data not shown) in which there is no 5' bias in intron location within single-intron genes [
<xref ref-type="bibr" rid="B4">4</xref>
]. The other explanation is that the reverse-transcriptase-mediated model may not be the only mechanism for intron loss in rice and that intron loss may occur via genomic deletion as proposed by Cho
<italic>et al</italic>
. [
<xref ref-type="bibr" rid="B37">37</xref>
], who observed no intron loss bias at the 3' end of genes in
<italic>Caenorhabditis</italic>
. However, according to the genomic deletion model, we would expect some instances of imprecise deletion of introns, which is not the case in our sample. Therefore, an unknown recognition signal may exist that allows the exact deletion of introns in rice.</p>
<p>We did not observe any statistically significant differences in the frequency of intron losses in different phases. Nor did examination of nucleotide compositional patterns in the exons surrounding the splicing site reveal an apparent pattern in the bi-nucleotide sequence of the exon at the boundary other than that shown by canonical splice site consensus sequence (AG|GT) in which '|' represents the intron position (data not shown). Yet conservation of the exon nucleotides adjacent to the exon-intron boundary has been reported to play an important role in correct splicing [
<xref ref-type="bibr" rid="B38">38</xref>
-
<xref ref-type="bibr" rid="B40">40</xref>
]. Within the four nucleotides at the donor splice site, we observed that the exon boundary of lost introns had less frequently used 4-mers than their corresponding retained introns, as well as relative to the sample of all approximately 33,000 introns. Thus, genes with less common exonic sequence at the donor site may experience splicing inaccuracy and inefficiency and, consequently, intron loss at these positions may be strongly favored by selection. Alternatively, it is possible that the less common 4-mers reflect exonic sequences more prone to direct intron loss, in the case of the genomic deletion model. Since we did not have a large sample for each intron phase, our data were insufficient to draw a correlation between intron loss rate at each phase and the nucleotide composition of the flanking exonic sequence.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We were able to document intron loss and gain in segmentally duplicated rice genes with a rate of loss and gain similar to that observed within orthologous genes across a range of eukaryotes. While we did not observe preferential intron loss at the 3' end of genes, we did observe a nucleotide bias within the exonic sequence flanking the lost introns.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and methods</title>
<sec>
<title>Identification of segmentally duplicated genes</title>
<p>A total of 43,719 non-transposable element-related rice protein sequences from release 3 of the TIGR Rice Genome Annotation [
<xref ref-type="bibr" rid="B22">22</xref>
] were used to identify segmental duplication in rice using an all versus all BLAST search (WU-BLASTP, parameters "V = 5 B = 5 E = 1e-10 - filter seg) [
<xref ref-type="bibr" rid="B41">41</xref>
]. As alternative splicing occurs in rice and some genes have multiple splice forms, the largest peptide sequence was used whenever alternative isoforms existed. Repetitive matches were filtered using perl scripts to remove low scoring matches within multiple alignment regions that were defined by a high-scoring segment pair within 50 kb. Segmentally duplicated blocks were identified using DAGchainer [
<xref ref-type="bibr" rid="B42">42</xref>
] with parameters '-s -I -D 200000'. which primarily includes self comparisons, ignores tandem duplication alignments, and sets the maximum distance allowed between two collinear gene pairs to 200 kb. A minimum of six gene pairs was used to define a block.</p>
</sec>
<sec>
<title>Identification of congruent and incongruent introns</title>
<p>Duplicated genes with at least one intron were checked to ensure that they were supported by a fl-cDNA and that no alternative isoforms existed. Intron positions and phases were retrieved from the TIGR Osa1 genome annotation database [
<xref ref-type="bibr" rid="B22">22</xref>
]. ClustalW [
<xref ref-type="bibr" rid="B43">43</xref>
,
<xref ref-type="bibr" rid="B44">44</xref>
] with default parameter settings was run for each pair to obtain a global alignment. Intron positions and phases were then inserted into the ClustalW alignment using perl scripts. Alignments with incongruent exon-intron structure were manually checked to ensure the introns were supported by reliable alignments. For the ten amino acids flanking the splice site (five amino acids on each side), we required that at a minimum, three amino acids had to be identical and that approximately 60% similarity was observed.</p>
</sec>
<sec>
<title>Identification of intron loss and intron gain</title>
<p>Simple phylogeny analysis was used to determine if the incongruent exon-intron structure was attributable to loss or gain of an intron. We identified putative orthologous genes by searching the predicted
<italic>Arabidopsis </italic>
proteome (TIGR release 5, [
<xref ref-type="bibr" rid="B45">45</xref>
]) with the predicted rice proteome using blastp (E-value < 1e-10) and selecting the reciprocal best hit. In the event we did not identify an ortholog in
<italic>Arabidopsis </italic>
via the reciprocal top match method, we used the best
<italic>Arabidopsis </italic>
match. Using the
<italic>Arabidopsis </italic>
genes as the outgroup, we aligned the rice duplicated gene models to the orthologous
<italic>Arabidopsis </italic>
gene model. ClustalW with default parameter settings was run for each triplet (the two rice gene models and their putative
<italic>Arabidopsis </italic>
ortholog) and intron positions and phases were inserted into the ClustalW alignment (Additional data file 3). Only loss or gain of introns after segmental duplication was examined further. An intron loss was defined if the intron was present at the same position in only a single rice gene and the putative
<italic>Arabidopsis </italic>
ortholog (referred to as a retained intron). An intron gain was defined if the intron was present in single rice gene but absent in the other rice paralog and the putative
<italic>Arabidopsis </italic>
ortholog.</p>
</sec>
<sec>
<title>Randomization test for intron loss rate at phase 0, 1, 2</title>
<p>A total of 233 pairs of duplicated genes, among which 197 pairs have completely conserved introns and 36 pairs show putative loss and gain of introns, were used in our randomization test. The total number of conserved intron alignment positions at each phase was counted (P0, 580; P1, 236; P2, 225). The total number of independently lost introns at each phase was counted (P0, 15; P1, 7; P2, 12). A total of 10,000 iterations were simulated. A total of 34 phases were randomly generated in each iteration based on the frequencies of the conserved aligned intron positions at each phase from the 233 gene pairs. The number of lost introns at each phase was then compared with those generated by simulation.</p>
</sec>
<sec>
<title>Nucleotide composition of exonic sequences flanking lost introns, retained introns, and all introns</title>
<p>To determine whether lost introns in duplicated rice genes tend to be flanked by rare nucleotide combinations, we compared the frequency distribution of the four nucleotides (4-mers) in the exonic sequence that flanked lost introns with the exonic 4-mers flanking the corresponding retained introns, as well as with the frequency distribution of the 4-mers flanking all introns in the genome. Comparisons were done independently for 4-mers flanking the donor and the acceptor ends of introns. The small number of lost introns, distributed over three intron phases (34 introns, of which 15, 7 and 12 were from phases 0, 1 and 2, respectively) relative to the total number of 4-mer classes (4
<sup>4 </sup>
= 256) precludes effective use of standard tests, such as the chi-square test, to compare the distributions. Instead, tests based on rank distributions were used as described below.</p>
<sec>
<title>Comparison of 4-mers flanking lost introns versus all introns</title>
<p>A total of 33,011 introns within the coding regions from 6,046 rice gene models that were supported with fl-cDNA, had no alternative splicing isoform, and had at least one intron within the CDS were used to determine the 4-mer distribution in exonic sequences that flank the introns. The four nucleotides that flank the donor and acceptor splice sites of each intron were extracted and their frequency calculated. For each intron phase, each 4-mer was given a rank between 1 and 256, to cover all of the 4
<sup>4 </sup>
nucleotide combinations, with the lowest frequency having the smallest rank (rank = 1). In this way, three rank distributions, one for each intron phase 0, 1 and 2, and their attached frequency distributions, were generated for each the donor and the acceptor flanking regions.</p>
<p>We devised a statistic that we call 'sum of ranks', SoR, to determine if the 4-mers flanking lost introns are less common than expected by chance. This statistic SoR corresponds to the sum of the ranks of all introns in a sample, as determined by their nucleotide composition and phase. The test was conducted as follows: 10,000 pseudo-replicates were generated by randomly sampling the three rank distribution obtained for all introns, according to their frequency distribution (that is, each rank was selected with probability equal to its frequency). Each pseudo-replicate consisted of 34 sampled introns, 15, 7 and 12 of which were sampled from the rank distribution of phase 0, 1, and 2 introns, respectively, to preserve the characteristics of the observed distribution of lost introns. A SoR value was obtained for each pseudo-replicate to generate the distribution of expected 'sum of ranks'. The SoR for the 34 lost introns was compared against this distribution to determine the probability
<italic>P </italic>
of obtaining this value by chance.
<italic>P </italic>
is approximately equal to the fraction of pseudo-replicated with a smaller or equal SoR value.</p>
</sec>
<sec>
<title>Comparison of 4-mers flanking lost introns versus retained introns, in the corresponding duplicate gene</title>
<p>A rank was attributed to each lost intron, based on the composition of its 4-mer and its intron phase, according to the rank distributions obtained for all 33,011 introns (see above), to obtain a distribution of ranks for the set of lost introns. A distribution of ranks for the set of retained introns was obtained in a similar way. The two distributions were compared using a Wilcoxon's signed rank test. This procedure was done for both donor and acceptor flanking sequences.</p>
</sec>
</sec>
<sec>
<title>Identification of the source elements of gained introns</title>
<p>Sequences of the five gained introns were searched against the NCBI non-redundant database and were further searched against all the 12 rice pseudomolecules [
<xref ref-type="bibr" rid="B22">22</xref>
]. Significant hits were manually checked. For each case of a gained intron, we examined homologous proteins from three plant species with substantial genome sequence: maize, sorghum, and poplar. Using the protein sequences of the ten rice genes with gained introns, we searched the TIGR Assembled Zea Mays (AZMs) sequences, which are assemblies of gene enrichment sequences [
<xref ref-type="bibr" rid="B46">46</xref>
,
<xref ref-type="bibr" rid="B47">47</xref>
], TIGR Assembled Sorghum Bicolor (ASBs) which are assemblies of gene enrichment reads from sorghum [
<xref ref-type="bibr" rid="B48">48</xref>
], and contigs from the poplar genome project [
<xref ref-type="bibr" rid="B49">49</xref>
]. All of the top hits from maize and sorghum had >70% similarity at the protein level with the rice proteins. Gene models were predicted by running the
<italic>ab initio </italic>
gene finder FGENESH [
<xref ref-type="bibr" rid="B50">50</xref>
] on the maize, sorghum and poplar genomic sequences. We used ClustalW with default parameter settings to align the six proteins (two rice proteins and the homologous proteins from
<italic>Arabidopsis</italic>
, maize, sorghum and poplar) and inserted the intron positions/phases into the ClustalW alignment.</p>
</sec>
<sec>
<title>Determination of substitutions per site</title>
<p>The number of synonymous substitutions per synonymous site (
<italic>d</italic>
<sub>S</sub>
) between each of the two rice duplicates was estimated by maximum likelihood, using the codon-based substitution model of Yang
<italic>et al</italic>
. [
<xref ref-type="bibr" rid="B51">51</xref>
] as implemented in codeml of PAML, version 3.15 [
<xref ref-type="bibr" rid="B51">51</xref>
,
<xref ref-type="bibr" rid="B52">52</xref>
]. Codeml was run using in pairwise mode (runmode = -2), with codon equilibrium frequencies estimated from average nucleotide frequencies at each codon position (codonFreq = 2). Given the estimated age of approximately 70 MYA for the polyploidization event in rice [
<xref ref-type="bibr" rid="B25">25</xref>
], and the estimated substitution rate in synonymous sites of approximately 6.5 × 10
<sup>-9</sup>
/site/year [
<xref ref-type="bibr" rid="B53">53</xref>
], rice paralogs resulting from this polyploidization event are expected to differ on average by approximately 0.9 synonymous substitution per site.</p>
</sec>
</sec>
<sec>
<title>Additional data files</title>
<p>The following additional data are available with the online version of this paper. Additional data file
<xref ref-type="supplementary-material" rid="S1">1</xref>
lists the segmentally duplicated blocks within the rice genome. Additional data file
<xref ref-type="supplementary-material" rid="S2">2</xref>
lists 3,101 pairs of segmentally duplicated genes along with their pairings and their sequence. Additional data file
<xref ref-type="supplementary-material" rid="S3">3</xref>
shows the ClustalW alignment of the two rice duplicated genes and their orthologous gene from
<italic>Arabidopsis</italic>
. Additional data file
<xref ref-type="supplementary-material" rid="S4">4</xref>
lists the occurrence of background exonic 4-mers at the donor splice sites of different intron phase. Additional data file
<xref ref-type="supplementary-material" rid="S5">5</xref>
lists the occurrence of background exonic 4-mer at the acceptor splice sites of different intron phase. Additional data file
<xref ref-type="supplementary-material" rid="S6">6</xref>
shows the ClustalW alignment of the two rice duplicated proteins with putative orthologous proteins from
<italic>Arabidopsis</italic>
, poplar, maize and sorghum.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional File 1</title>
<p>The segmentally duplicated blocks within the rice genome.</p>
</caption>
<media xlink:href="gb-2006-7-5-r41-S1.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2">
<caption>
<title>Additional File 2</title>
<p>The 3,101 pairs of segmentally duplicated genes along with their pairings and their sequence.</p>
</caption>
<media xlink:href="gb-2006-7-5-r41-S2.pdf" mimetype="application" mime-subtype="pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S3">
<caption>
<title>Additional File 3</title>
<p>The ClustalW alignment of the two rice duplicated genes and their orthologous gene from
<italic>Arabidopsis</italic>
.</p>
</caption>
<media xlink:href="gb-2006-7-5-r41-S3.pdf" mimetype="application" mime-subtype="pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S4">
<caption>
<title>Additional File 4</title>
<p>The occurrence of background exonic 4-mers at the donor splice sites of different intron phase.</p>
</caption>
<media xlink:href="gb-2006-7-5-r41-S4.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S5">
<caption>
<title>Additional File 5</title>
<p>The occurrence of background exonic 4-mer at the acceptor splice sites of different intron phase.</p>
</caption>
<media xlink:href="gb-2006-7-5-r41-S5.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S6">
<caption>
<title>Additional File 6</title>
<p>The ClustalW alignment of the two rice duplicated proteins with putative orthologous proteins from
<italic>Arabidopsis</italic>
, poplar, maize and sorghum.</p>
</caption>
<media xlink:href="gb-2006-7-5-r41-S6.pdf" mimetype="application" mime-subtype="pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<sec>
<title>Acknowledgements</title>
<p>This work was supported by a National Science Foundation Plant Genome Research Program grant to C.R.B. (DBI-0321538).</p>
</sec>
</ack>
<ref-list>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fedorov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Merican</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Large-scale comparison of intron positions among animal, plant, and fungal genes.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2002</year>
<volume>99</volume>
<fpage>16128</fpage>
<lpage>16133</lpage>
<pub-id pub-id-type="pmid">12444254</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.242624899</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rogozin</surname>
<given-names>IB</given-names>
</name>
<name>
<surname>Wolf</surname>
<given-names>YI</given-names>
</name>
<name>
<surname>Sorokin</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Mirkin</surname>
<given-names>BG</given-names>
</name>
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
</person-group>
<article-title>Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution.</article-title>
<source>Curr Biol</source>
<year>2003</year>
<volume>13</volume>
<fpage>1512</fpage>
<lpage>1517</lpage>
<pub-id pub-id-type="pmid">12956953</pub-id>
<pub-id pub-id-type="doi">10.1016/S0960-9822(03)00558-X</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fink</surname>
<given-names>GR</given-names>
</name>
</person-group>
<article-title>Pseudogenes in yeast?</article-title>
<source>Cell</source>
<year>1987</year>
<volume>49</volume>
<fpage>5</fpage>
<lpage>6</lpage>
<pub-id pub-id-type="pmid">3549000</pub-id>
<pub-id pub-id-type="doi">10.1016/0092-8674(87)90746-X</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sakurai</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Fujimori</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kochiwa</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kitamura-Abe</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Washio</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Saito</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Carninci</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hayashizaki</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tomita</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>On biased distribution of introns in various eukaryotes.</article-title>
<source>Gene</source>
<year>2002</year>
<volume>300</volume>
<fpage>89</fpage>
<lpage>95</lpage>
<pub-id pub-id-type="pmid">12468090</pub-id>
<pub-id pub-id-type="doi">10.1016/S0378-1119(02)01035-1</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roy</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Complex early genes.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2005</year>
<volume>102</volume>
<fpage>1986</fpage>
<lpage>1991</lpage>
<pub-id pub-id-type="pmid">15687506</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.0408355101</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sverdlov</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Babenko</surname>
<given-names>VN</given-names>
</name>
<name>
<surname>Rogozin</surname>
<given-names>IB</given-names>
</name>
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
</person-group>
<article-title>Preferential loss and gain of introns in 3' portions of genes suggests a reverse-transcription mechanism of intron insertion.</article-title>
<source>Gene</source>
<year>2004</year>
<volume>338</volume>
<fpage>85</fpage>
<lpage>91</lpage>
<pub-id pub-id-type="pmid">15302409</pub-id>
<pub-id pub-id-type="doi">10.1016/j.gene.2004.05.027</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nielsen</surname>
<given-names>CB</given-names>
</name>
<name>
<surname>Friedman</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Birren</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Burge</surname>
<given-names>CB</given-names>
</name>
<name>
<surname>Galagan</surname>
<given-names>JE</given-names>
</name>
</person-group>
<article-title>Patterns of intron gain and loss in fungi.</article-title>
<source>PLoS Biol</source>
<year>2004</year>
<volume>2</volume>
<fpage>e422</fpage>
<pub-id pub-id-type="pmid">15562318</pub-id>
<pub-id pub-id-type="doi">10.1371/journal.pbio.0020422</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fedorov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Suboch</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Bujakov</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Fedorova</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Analysis of nonuniformity in intron phase distribution.</article-title>
<source>Nucleic Acids Res</source>
<year>1992</year>
<volume>20</volume>
<fpage>2553</fpage>
<lpage>2557</lpage>
<pub-id pub-id-type="pmid">1598214</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Long</surname>
<given-names>M</given-names>
</name>
<name>
<surname>de Souza</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Evolution of the intron-exon structure of eukaryotic genes.</article-title>
<source>Curr Opin Genet Dev</source>
<year>1995</year>
<volume>5</volume>
<fpage>774</fpage>
<lpage>778</lpage>
<pub-id pub-id-type="pmid">8745076</pub-id>
<pub-id pub-id-type="doi">10.1016/0959-437X(95)80010-3</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tomita</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Shimizu</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Brutlag</surname>
<given-names>DL</given-names>
</name>
</person-group>
<article-title>Introns and reading frames: correlation between splicing sites and their codon positions.</article-title>
<source>Mol Biol Evol</source>
<year>1996</year>
<volume>13</volume>
<fpage>1219</fpage>
<lpage>1223</lpage>
<pub-id pub-id-type="pmid">8896374</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gilbert</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>The exon theory of genes.</article-title>
<source>Cold Spring Harb Symp Quant Biol</source>
<year>1987</year>
<volume>52</volume>
<fpage>901</fpage>
<lpage>905</lpage>
<pub-id pub-id-type="pmid">2456887</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gilbert</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Glynias</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>On the ancient nature of introns.</article-title>
<source>Gene</source>
<year>1993</year>
<volume>135</volume>
<fpage>137</fpage>
<lpage>144</lpage>
<pub-id pub-id-type="pmid">8276250</pub-id>
<pub-id pub-id-type="doi">10.1016/0378-1119(93)90058-B</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qiu</surname>
<given-names>WG</given-names>
</name>
<name>
<surname>Schisler</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Stoltzfus</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>The evolutionary gain of spliceosomal introns: sequence and phase preferences.</article-title>
<source>Mol Biol Evol</source>
<year>2004</year>
<volume>21</volume>
<fpage>1252</fpage>
<lpage>1263</lpage>
<pub-id pub-id-type="pmid">15014153</pub-id>
<pub-id pub-id-type="doi">10.1093/molbev/msh120</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Coghlan</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Wolfe</surname>
<given-names>KH</given-names>
</name>
</person-group>
<article-title>Origins of recently gained introns in
<italic>Caenorhabditis</italic>
.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2004</year>
<volume>101</volume>
<fpage>11362</fpage>
<lpage>11367</lpage>
<pub-id pub-id-type="pmid">15243155</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.0308192101</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roy</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Fedorov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2003</year>
<volume>100</volume>
<fpage>7158</fpage>
<lpage>7162</lpage>
<pub-id pub-id-type="pmid">12777620</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.1232297100</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roy</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Rates of intron loss and gain: implications for early eukaryotic evolution.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2005</year>
<volume>102</volume>
<fpage>5773</fpage>
<lpage>5778</lpage>
<pub-id pub-id-type="pmid">15827119</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.0500383102</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roy</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>The pattern of intron loss.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2005</year>
<volume>102</volume>
<fpage>713</fpage>
<lpage>718</lpage>
<pub-id pub-id-type="pmid">15642949</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.0408274102</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fedorov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Roy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fedorova</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Mystery of intron gain.</article-title>
<source>Genome Res</source>
<year>2003</year>
<volume>13</volume>
<fpage>2236</fpage>
<lpage>2241</lpage>
<pub-id pub-id-type="pmid">12975308</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.1029803</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Babenko</surname>
<given-names>VN</given-names>
</name>
<name>
<surname>Rogozin</surname>
<given-names>IB</given-names>
</name>
<name>
<surname>Mekhedov</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
</person-group>
<article-title>Prevalence of intron gain over intron loss in the evolution of paralogous gene families.</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<fpage>3724</fpage>
<lpage>3733</lpage>
<pub-id pub-id-type="pmid">15254274</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkh686</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sverdlov</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Rogozin</surname>
<given-names>IB</given-names>
</name>
<name>
<surname>Babenko</surname>
<given-names>VN</given-names>
</name>
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
</person-group>
<article-title>Conservation versus parallel gains in intron evolution.</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<fpage>1741</fpage>
<lpage>1748</lpage>
<pub-id pub-id-type="pmid">15788746</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gki316</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<collab>International Rice Genome Sequencing Project</collab>
</person-group>
<article-title>The map-based sequence of the rice genome.</article-title>
<source>Nature</source>
<year>2005</year>
<volume>436</volume>
<fpage>793</fpage>
<lpage>800</lpage>
<pub-id pub-id-type="pmid">16100779</pub-id>
<pub-id pub-id-type="doi">10.1038/nature03895</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yuan</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Ouyang</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Maiti</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hamilton</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Haas</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Sultana</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cheung</surname>
<given-names>F</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The Institute for Genomic Research Osa1 rice genome annotation database.</article-title>
<source>Plant Physiol</source>
<year>2005</year>
<volume>138</volume>
<fpage>18</fpage>
<lpage>26</lpage>
<pub-id pub-id-type="pmid">15888674</pub-id>
<pub-id pub-id-type="doi">10.1104/pp.104.059063</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wolfe</surname>
<given-names>KH</given-names>
</name>
<name>
<surname>Gouy</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>YW</given-names>
</name>
<name>
<surname>Sharp</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>WH</given-names>
</name>
</person-group>
<article-title>Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>1989</year>
<volume>86</volume>
<fpage>6201</fpage>
<lpage>6205</lpage>
<pub-id pub-id-type="pmid">2762323</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crane</surname>
<given-names>PR</given-names>
</name>
<name>
<surname>Friis</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>KR</given-names>
</name>
</person-group>
<article-title>The origin and early diversification of angiosperms.</article-title>
<source>Nature</source>
<year>2002</year>
<volume>374</volume>
<fpage>27</fpage>
<lpage>33</lpage>
<pub-id pub-id-type="doi">10.1038/374027a0</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paterson</surname>
<given-names>AH</given-names>
</name>
<name>
<surname>Bowers</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Chapman</surname>
<given-names>BA</given-names>
</name>
</person-group>
<article-title>Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2004</year>
<volume>101</volume>
<fpage>9903</fpage>
<lpage>9908</lpage>
<pub-id pub-id-type="pmid">15161969</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.0307901101</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vandepoele</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Simillion</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Evidence that rice and other cereals are ancient aneuploids.</article-title>
<source>Plant Cell</source>
<year>2003</year>
<volume>15</volume>
<fpage>2192</fpage>
<lpage>2202</lpage>
<pub-id pub-id-type="pmid">12953120</pub-id>
<pub-id pub-id-type="doi">10.1105/tpc.014019</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Ge</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Duplication and DNA segmental loss in the rice genome: implications for diploidization.</article-title>
<source>New Phytol</source>
<year>2005</year>
<volume>165</volume>
<fpage>937</fpage>
<lpage>946</lpage>
<pub-id pub-id-type="pmid">15720704</pub-id>
<pub-id pub-id-type="doi">10.1111/j.1469-8137.2004.01293.x</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Simillion</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Vandepoele</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Saeys</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Building genomic profiles for uncovering segmental homology in the twilight zone.</article-title>
<source>Genome Res</source>
<year>2004</year>
<volume>14</volume>
<fpage>1095</fpage>
<lpage>1106</lpage>
<pub-id pub-id-type="pmid">15173115</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.2179004</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guyot</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Keller</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Ancestral genome duplication in rice.</article-title>
<source>Genome</source>
<year>2004</year>
<volume>47</volume>
<fpage>610</fpage>
<lpage>614</lpage>
<pub-id pub-id-type="pmid">15190378</pub-id>
<pub-id pub-id-type="doi">10.1139/g04-016</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Castillo-Davis</surname>
<given-names>CI</given-names>
</name>
<name>
<surname>Bedford</surname>
<given-names>TB</given-names>
</name>
<name>
<surname>Hartl</surname>
<given-names>DL</given-names>
</name>
</person-group>
<article-title>Accelerated rates of intron gain/loss and protein evolution in duplicate genes in human and mouse malaria parasites.</article-title>
<source>Mol Biol Evol</source>
<year>2004</year>
<volume>21</volume>
<fpage>1422</fpage>
<lpage>1427</lpage>
<pub-id pub-id-type="pmid">15084679</pub-id>
<pub-id pub-id-type="doi">10.1093/molbev/msh143</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Logsdon</surname>
<given-names>JM</given-names>
<suffix>Jr</suffix>
</name>
</person-group>
<article-title>The recent origins of spliceosomal introns revisited.</article-title>
<source>Curr Opin Genet Dev</source>
<year>1998</year>
<volume>8</volume>
<fpage>637</fpage>
<lpage>648</lpage>
<pub-id pub-id-type="pmid">9914210</pub-id>
<pub-id pub-id-type="doi">10.1016/S0959-437X(98)80031-2</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Logsdon</surname>
<given-names>JM</given-names>
<suffix>Jr</suffix>
</name>
</person-group>
<article-title>Worm genomes hold the smoking guns of intron gain.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2004</year>
<volume>101</volume>
<fpage>11195</fpage>
<lpage>11196</lpage>
<pub-id pub-id-type="pmid">15277668</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.0404148101</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Long</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Deutsch</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Association of intron phases with conservation at splice site sequences and evolution of spliceosomal introns.</article-title>
<source>Mol Biol Evol</source>
<year>1999</year>
<volume>16</volume>
<fpage>1528</fpage>
<lpage>1534</lpage>
<pub-id pub-id-type="pmid">10555284</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>GD</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>PF</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>ZK</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>He</surname>
<given-names>ZH</given-names>
</name>
</person-group>
<article-title>Genomic characterization of Rim2/Hipa elements reveals a CACTA-like transposon superfamily with unique features in the rice genome.</article-title>
<source>Mol Genet Genomics</source>
<year>2003</year>
<volume>270</volume>
<fpage>234</fpage>
<lpage>242</lpage>
<pub-id pub-id-type="pmid">14513364</pub-id>
<pub-id pub-id-type="doi">10.1007/s00438-003-0918-z</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bernstein</surname>
<given-names>LB</given-names>
</name>
<name>
<surname>Mount</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Weiner</surname>
<given-names>AM</given-names>
</name>
</person-group>
<article-title>Pseudogenes for human small nuclear RNA U3 appear to arise by integration of self-primed reverse transcripts of the RNA into new chromosomal sites.</article-title>
<source>Cell</source>
<year>1983</year>
<volume>32</volume>
<fpage>461</fpage>
<lpage>472</lpage>
<pub-id pub-id-type="pmid">6186397</pub-id>
<pub-id pub-id-type="doi">10.1016/0092-8674(83)90466-X</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lewin</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>How mammalian RNA returns to its genome.</article-title>
<source>Science</source>
<year>1983</year>
<volume>219</volume>
<fpage>1052</fpage>
<lpage>1054</lpage>
<pub-id pub-id-type="pmid">6186029</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cho</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ellis</surname>
<given-names>RE</given-names>
</name>
</person-group>
<article-title>A phylogeny of
<italic>Caenorhabditis</italic>
reveals frequent loss of introns during nematode evolution.</article-title>
<source>Genome Res</source>
<year>2004</year>
<volume>14</volume>
<fpage>1207</fpage>
<lpage>1220</lpage>
<pub-id pub-id-type="pmid">15231741</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.2639304</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Seraphin</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Rosbash</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Exon mutations uncouple 5' splice site selection from U1 snRNA pairing.</article-title>
<source>Cell</source>
<year>1990</year>
<volume>63</volume>
<fpage>619</fpage>
<lpage>629</lpage>
<pub-id pub-id-type="pmid">2225068</pub-id>
<pub-id pub-id-type="doi">10.1016/0092-8674(90)90457-P</pub-id>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Treisman</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Proudfoot</surname>
<given-names>NJ</given-names>
</name>
<name>
<surname>Shander</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Maniatis</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>A single-base change at a splice site in a beta 0-thalassemic gene causes abnormal RNA splicing.</article-title>
<source>Cell</source>
<year>1982</year>
<volume>29</volume>
<fpage>903</fpage>
<lpage>911</lpage>
<pub-id pub-id-type="pmid">7151176</pub-id>
<pub-id pub-id-type="doi">10.1016/0092-8674(82)90452-4</pub-id>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jacobsen</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Binkowski</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Olszewski</surname>
<given-names>NE</given-names>
</name>
</person-group>
<article-title>SPINDLY, a tetratricopeptide repeat protein involved in gibberellin signal transduction in
<italic>Arabidopsis</italic>
.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>1996</year>
<volume>93</volume>
<fpage>9292</fpage>
<lpage>9296</lpage>
<pub-id pub-id-type="pmid">8799194</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.93.17.9292</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="other">
<article-title>Washington University BLAST Archives</article-title>
<ext-link ext-link-type="uri" xlink:href="http://blast.wustl.edu"></ext-link>
</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haas</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Delcher</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Wortman</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>DAGchainer: a tool for mining segmental genome duplications and synteny.</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>20</volume>
<fpage>3643</fpage>
<lpage>3646</lpage>
<pub-id pub-id-type="pmid">15247098</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bth397</pub-id>
</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chenna</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Sugawara</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Koike</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Lopez</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Higgins</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>JD</given-names>
</name>
</person-group>
<article-title>Multiple sequence alignment with the Clustal series of programs.</article-title>
<source>Nucleic Acids Res</source>
<year>2003</year>
<volume>31</volume>
<fpage>3497</fpage>
<lpage>3500</lpage>
<pub-id pub-id-type="pmid">12824352</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkg500</pub-id>
</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thompson</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Higgins</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>TJ</given-names>
</name>
</person-group>
<article-title>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.</article-title>
<source>Nucleic Acids Res</source>
<year>1994</year>
<volume>22</volume>
<fpage>4673</fpage>
<lpage>4680</lpage>
<pub-id pub-id-type="pmid">7984417</pub-id>
</citation>
</ref>
<ref id="B45">
<citation citation-type="other">
<article-title>The TIGR
<italic>Arabidopsis thaliana </italic>
Database</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.tigr.org/tdb/e2k1/ath1/"></ext-link>
</citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Whitelaw</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Barbazuk</surname>
<given-names>WB</given-names>
</name>
<name>
<surname>Pertea</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>AP</given-names>
</name>
<name>
<surname>Cheung</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>L</given-names>
</name>
<name>
<surname>van Heeringen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Karamycheva</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bennetzen</surname>
<given-names>JL</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Enrichment of gene-coding sequences in maize by genome filtration.</article-title>
<source>Science</source>
<year>2003</year>
<volume>302</volume>
<fpage>2118</fpage>
<lpage>2120</lpage>
<pub-id pub-id-type="pmid">14684821</pub-id>
<pub-id pub-id-type="doi">10.1126/science.1090047</pub-id>
</citation>
</ref>
<ref id="B47">
<citation citation-type="other">
<article-title>The TIGR Maize Database</article-title>
<ext-link ext-link-type="uri" xlink:href="http://maize.tigr.org/"></ext-link>
</citation>
</ref>
<ref id="B48">
<citation citation-type="other">
<article-title>TIGR Assembled Sorghum Bicolor</article-title>
<ext-link ext-link-type="uri" xlink:href="ftp://ftp.tigr.org/pub/data/MAIZE/Sorghum_assembly/ASB.gz"></ext-link>
</citation>
</ref>
<ref id="B49">
<citation citation-type="other">
<article-title>The JGI
<italic>Populus trichocarpa </italic>
Genome WebSite</article-title>
<ext-link ext-link-type="uri" xlink:href="http://genome.jgi-psf.org/Poptr1/Poptr1.home.html"></ext-link>
</citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Salamov</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Solovyev</surname>
<given-names>VV</given-names>
</name>
</person-group>
<article-title>
<italic>Ab initio</italic>
gene finding in
<italic>Drosophila</italic>
genomic DNA.</article-title>
<source>Genome Res</source>
<year>2000</year>
<volume>10</volume>
<fpage>516</fpage>
<lpage>522</lpage>
<pub-id pub-id-type="pmid">10779491</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.10.4.516</pub-id>
</citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hasegawa</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Models of amino acid substitution and applications to mitochondrial protein evolution.</article-title>
<source>Mol Biol Evol</source>
<year>1998</year>
<volume>15</volume>
<fpage>1600</fpage>
<lpage>1611</lpage>
<pub-id pub-id-type="pmid">9866196</pub-id>
</citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yang</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>PAML: a program package for phylogenetic analysis by maximum likelihood.</article-title>
<source>Comput Appl Biosci</source>
<year>1997</year>
<volume>13</volume>
<fpage>555</fpage>
<lpage>556</lpage>
<pub-id pub-id-type="pmid">9367129</pub-id>
</citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gaut</surname>
<given-names>BS</given-names>
</name>
<name>
<surname>Morton</surname>
<given-names>BR</given-names>
</name>
<name>
<surname>McCaig</surname>
<given-names>BC</given-names>
</name>
<name>
<surname>Clegg</surname>
<given-names>MT</given-names>
</name>
</person-group>
<article-title>Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>1996</year>
<volume>93</volume>
<fpage>10274</fpage>
<lpage>10279</lpage>
<pub-id pub-id-type="pmid">8816790</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.93.19.10274</pub-id>
</citation>
</ref>
</ref-list>
<sec sec-type="display-objects">
<title>Figures and Tables</title>
<fig position="float" id="F1">
<label>Figure 1</label>
<caption>
<p>Flow chart for the identification of intron gain and intron loss within segmentally duplicated rice genes. TE, transposable element.</p>
</caption>
<graphic xlink:href="gb-2006-7-5-r41-1"></graphic>
</fig>
<fig position="float" id="F2">
<label>Figure 2</label>
<caption>
<p>Example of intron loss. Multiple alignment of the two duplicated rice genes (top; LOC__Os03g18690.1, LOC_Os07g49150.1) and their putative orthologous
<italic>Arabidopsis </italic>
gene (bottom; At4g29040.1) suggests that the third intron of LOC_ Os07g49150.1 was lost. Yellow inserts indicate conserved introns across the three genes while red indicates lost intron. The phase of the intron is inserted into the alignment. All conserved introns are phase 0 whereas the lost intron is phase 2. The two rice genes and putative
<italic>Arabidopsis </italic>
ortholog encode a 26S proteasome regulatory subunit 4.</p>
</caption>
<graphic xlink:href="gb-2006-7-5-r41-2"></graphic>
</fig>
<fig position="float" id="F3">
<label>Figure 3</label>
<caption>
<p>Distribution of the sizes of the lost and gained introns. Intron lengths were binned into 100 bp bins and the number of lost and gained introns in each bin was determined and plotted against the frequency of 33,011 FLS introns within the rice genome.</p>
</caption>
<graphic xlink:href="gb-2006-7-5-r41-3"></graphic>
</fig>
<fig position="float" id="F4">
<label>Figure 4</label>
<caption>
<p>Intron loss along the coding sequence. The positions of the lost introns were inferred from the retained intron of its corresponding duplicated gene. The whole coding sequence was divided into 10 bins. The positions of independently lost introns were placed into the corresponding bin and plotted against the frequency of all 33,011 FLS introns within the rice genome, which had been binned into the same 10 bins.</p>
</caption>
<graphic xlink:href="gb-2006-7-5-r41-4"></graphic>
</fig>
<fig position="float" id="F5">
<label>Figure 5</label>
<caption>
<p>Extraction of the exonic 4-mers at the donor and acceptor splice sites of lost and retained introns. Duplicated rice gene 1 with a single exon and rice gene 2 and
<italic>Arabidopsis </italic>
orthologous gene with two exons and a single intron are shown in colored rectangles. Dashed lines indicate similar regions. Phylogeny analysis with
<italic>Arabidopsis </italic>
suggests an intron was lost in rice gene 1. The red ovals show the 4-mers extracted for SoR analysis.</p>
</caption>
<graphic xlink:href="gb-2006-7-5-r41-5"></graphic>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption>
<p>Statistics of genome, genes, and regions within segmentally duplicated blocks of the rice genome</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td></td>
<td align="center" colspan="5">Maximum distance between collinear gene pairs</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Statistics</td>
<td align="center">100 kb</td>
<td align="center">200 Kb</td>
<td align="center">500 Kb</td>
<td align="center">1 Mb</td>
<td align="center">5 Mb</td>
</tr>
<tr>
<td colspan="6">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Region covered by duplicated blocks (Mb)</td>
<td align="center">96.04</td>
<td align="center">158.9</td>
<td align="center">193.25</td>
<td align="center">196.35</td>
<td align="center">197.96</td>
</tr>
<tr>
<td align="left">Region covered by multiple duplicated blocks (Mb)</td>
<td align="center">7.16</td>
<td align="center">30.6</td>
<td align="center">45.2</td>
<td align="center">45.31</td>
<td align="center">45.74</td>
</tr>
<tr>
<td align="left">Number of duplicated blocks</td>
<td align="center">151</td>
<td align="center">149</td>
<td align="center">101</td>
<td align="center">98</td>
<td align="center">96</td>
</tr>
<tr>
<td align="left">Genome coverage (%)</td>
<td align="center">25.9</td>
<td align="center">42.8</td>
<td align="center">52.1</td>
<td align="center">52.9</td>
<td align="center">53.4</td>
</tr>
<tr>
<td align="left">Non-TE gene coverage (%)</td>
<td align="center">30.3</td>
<td align="center">49.3</td>
<td align="center">59.1</td>
<td align="center">59.7</td>
<td align="center">60</td>
</tr>
<tr>
<td align="left">Total number of non-TE genes retained within duplicated blocks</td>
<td align="center">4,377</td>
<td align="center">5,567</td>
<td align="center">5,879</td>
<td align="center">5,894</td>
<td align="center">5,894</td>
</tr>
<tr>
<td align="left">Number gene pairs retained within duplicated blocks</td>
<td align="center">2,277</td>
<td align="center">3,101</td>
<td align="center">3,346</td>
<td align="center">3,355</td>
<td align="center">3,355</td>
</tr>
<tr>
<td align="left">Total number non-TE genes within duplicated blocks</td>
<td align="center">13,250</td>
<td align="center">21,570</td>
<td align="center">25,819</td>
<td align="center">26,114</td>
<td align="center">26,248</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption>
<p>Distribution of phase of intron loss in segmentally duplicated rice genes</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td></td>
<td align="center">Phase 0</td>
<td align="center">Phase 1</td>
<td align="center">Phase 2</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Intron loss*</td>
<td align="center">15</td>
<td align="center">7</td>
<td align="center">12</td>
</tr>
<tr>
<td align="left">Conserved introns
<sup></sup>
</td>
<td align="center">580</td>
<td align="center">236</td>
<td align="center">225</td>
</tr>
<tr>
<td align="left">Intron loss rate
<sup></sup>
</td>
<td align="center">2.5%</td>
<td align="center">2.8%</td>
<td align="center">5.1%</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>*Multiple consecutively lost introns were excluded from this analysis.
<sup></sup>
Conserved aligned intron positions within all 235 duplicate gene pairs.
<sup></sup>
Intron loss rate was calculated by (intron loss/(intron loss + conserved introns)) × 100.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption>
<p>4-mer usage of exonic sequence at donor splice site of lost and corresponding retained introns</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td align="center" colspan="3">Intron lost</td>
<td align="center" colspan="4">Intron retained</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Locus name*</td>
<td align="left">4-mer
<sup></sup>
</td>
<td align="center">Rank
<sup></sup>
</td>
<td align="left">Locus name
<sup>§</sup>
</td>
<td align="center">Phase</td>
<td align="left">4-mer</td>
<td align="center">Rank</td>
</tr>
<tr>
<td colspan="7">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">LOC_Os05g48520.1</td>
<td align="left">CAAG</td>
<td align="center">256</td>
<td align="left">LOC_Os01g48540.1</td>
<td align="center">0</td>
<td align="left">CAAG</td>
<td align="center">256</td>
</tr>
<tr>
<td align="left">LOC_Os06g44300.1</td>
<td align="left">CGAG</td>
<td align="center">245</td>
<td align="left">LOC_Os02g08230.1</td>
<td align="center">0</td>
<td align="left">CGAG</td>
<td align="center">245</td>
</tr>
<tr>
<td align="left">LOC_Os06g11920.1</td>
<td align="left">CAAG</td>
<td align="center">256</td>
<td align="left">LOC_Os02g51600.1</td>
<td align="center">0</td>
<td align="left">CAAG</td>
<td align="center">256</td>
</tr>
<tr>
<td align="left">LOC_Os06g10850.1</td>
<td align="left">GAGG</td>
<td align="center">219</td>
<td align="left">LOC_Os02g52830.1</td>
<td align="center">0</td>
<td align="left">CCAT</td>
<td align="center">211</td>
</tr>
<tr>
<td align="left">LOC_Os07g02440.1</td>
<td align="left">CGAC</td>
<td align="center">130</td>
<td align="left">LOC_Os03g55420.1</td>
<td align="center">0</td>
<td align="left">CGAG</td>
<td align="center">245</td>
</tr>
<tr>
<td align="left">LOC_Os07g12340.1</td>
<td align="left">CAGG</td>
<td align="center">234</td>
<td align="left">LOC_Os03g60080.1</td>
<td align="center">0</td>
<td align="left">CAGG</td>
<td align="center">234</td>
</tr>
<tr>
<td align="left">LOC_Os01g13130.1</td>
<td align="left">CGCC</td>
<td align="center">154</td>
<td align="left">LOC_Os05g14240.1</td>
<td align="center">0</td>
<td align="left">CATG</td>
<td align="center">244</td>
</tr>
<tr>
<td align="left">LOC_Os11g01820.1</td>
<td align="left">GCTC</td>
<td align="center">103</td>
<td align="left">LOC_Os05g39600.1</td>
<td align="center">0</td>
<td align="left">CATG</td>
<td align="center">244</td>
</tr>
<tr>
<td align="left">LOC_Os12g02840.1</td>
<td align="left">CCTC</td>
<td align="center">172</td>
<td align="left">LOC_Os05g40650.1</td>
<td align="center">0</td>
<td align="left">CCTC</td>
<td align="center">172</td>
</tr>
<tr>
<td align="left">LOC_Os02g14430.1</td>
<td align="left">CCAG</td>
<td align="center">251</td>
<td align="left">LOC_Os06g35480.1</td>
<td align="center">0</td>
<td align="left">CAAC</td>
<td align="center">193</td>
</tr>
<tr>
<td align="left">LOC_Os09g39720.1</td>
<td align="left">GGAG</td>
<td align="center">246</td>
<td align="left">LOC_Os08g44590.1</td>
<td align="center">0</td>
<td align="left">GGAG</td>
<td align="center">246</td>
</tr>
<tr>
<td align="left">LOC_Os02g54640.1</td>
<td align="left">GTTC</td>
<td align="center">28</td>
<td align="left">LOC_Os09g26160.1</td>
<td align="center">0</td>
<td align="left">TTTT</td>
<td align="center">133</td>
</tr>
<tr>
<td align="left">LOC_Os08g39370.1</td>
<td align="left">CAAC</td>
<td align="center">193</td>
<td align="left">LOC_Os09g31130.1</td>
<td align="center">0</td>
<td align="left">CAAC</td>
<td align="center">193</td>
</tr>
<tr>
<td align="left">LOC_Os08g41880.1</td>
<td align="left">CGAG</td>
<td align="center">245</td>
<td align="left">LOC_Os09g32840.1</td>
<td align="center">0</td>
<td align="left">TGAG</td>
<td align="center">253</td>
</tr>
<tr>
<td align="left">LOC_Os03g01820.1</td>
<td align="left">GAGG</td>
<td align="center">219</td>
<td align="left">LOC_Os10g39810.1</td>
<td align="center">0</td>
<td align="left">CAAG</td>
<td align="center">256</td>
</tr>
<tr>
<td align="left">LOC_Os05g38420.1</td>
<td align="left">TTCG</td>
<td align="center">225</td>
<td align="left">LOC_Os01g62490.1</td>
<td align="center">1</td>
<td align="left">TTCG</td>
<td align="center">225</td>
</tr>
<tr>
<td align="left">LOC_Os06g12960.1</td>
<td align="left">GACG</td>
<td align="center">228</td>
<td align="left">LOC_Os02g50810.1</td>
<td align="center">1</td>
<td align="left">CACG</td>
<td align="center">222</td>
</tr>
<tr>
<td align="left">LOC_Os09g26160.1</td>
<td align="left">CATC</td>
<td align="center">54</td>
<td align="left">LOC_Os02g54640.1</td>
<td align="center">1</td>
<td align="left">CACA</td>
<td align="center">171</td>
</tr>
<tr>
<td align="left">LOC_Os06g51050.1</td>
<td align="left">ACCG</td>
<td align="center">223</td>
<td align="left">LOC_Os03g04060.1</td>
<td align="center">1</td>
<td align="left">ACAG</td>
<td align="center">250</td>
</tr>
<tr>
<td align="left">LOC_Os02g46780.1</td>
<td align="left">GCCG</td>
<td align="center">227</td>
<td align="left">LOC_Os04g50770.1</td>
<td align="center">1</td>
<td align="left">GCAG</td>
<td align="center">251</td>
</tr>
<tr>
<td align="left">LOC_Os01g50760.1</td>
<td align="left">GGAG</td>
<td align="center">247</td>
<td align="left">LOC_Os05g46580.1</td>
<td align="center">1</td>
<td align="left">GGAG</td>
<td align="center">247</td>
</tr>
<tr>
<td align="left">LOC_Os11g09020.1</td>
<td align="left">GTCG</td>
<td align="center">216</td>
<td align="left">LOC_Os12g08090.1</td>
<td align="center">1</td>
<td align="left">ATCT</td>
<td align="center">194</td>
</tr>
<tr>
<td align="left">LOC_Os05g04690.1</td>
<td align="left">CGTG</td>
<td align="center">88</td>
<td align="left">LOC_Os01g18400.1</td>
<td align="center">2</td>
<td align="left">CATG</td>
<td align="center">237</td>
</tr>
<tr>
<td align="left">LOC_Os05g48700.1</td>
<td align="left">TGAG</td>
<td align="center">246</td>
<td align="left">LOC_Os01g55240.1</td>
<td align="center">2</td>
<td align="left">TCCG</td>
<td align="center">222</td>
</tr>
<tr>
<td align="left">LOC_Os05g39720.1</td>
<td align="left">GGTG</td>
<td align="center">115</td>
<td align="left">LOC_Os01g61080.1</td>
<td align="center">2</td>
<td align="left">GATG</td>
<td align="center">217</td>
</tr>
<tr>
<td align="left">LOC_Os07g49280.1</td>
<td align="left">CAAG</td>
<td align="center">254</td>
<td align="left">LOC_Os03g18140.1</td>
<td align="center">2</td>
<td align="left">CCCG</td>
<td align="center">142</td>
</tr>
<tr>
<td align="left">LOC_Os07g49150.1</td>
<td align="left">AGAG</td>
<td align="center">251</td>
<td align="left">LOC_Os03g18690.1</td>
<td align="center">2</td>
<td align="left">AGAG</td>
<td align="center">251</td>
</tr>
<tr>
<td align="left">LOC_Os07g49000.1</td>
<td align="left">GGAG</td>
<td align="center">245</td>
<td align="left">LOC_Os03g19200.1</td>
<td align="center">2</td>
<td align="left">GGAG</td>
<td align="center">245</td>
</tr>
<tr>
<td align="left">LOC_Os09g26360.1</td>
<td align="left">GAAG</td>
<td align="center">249</td>
<td align="left">LOC_Os08g34910.1</td>
<td align="center">2</td>
<td align="left">GAAG</td>
<td align="center">249</td>
</tr>
<tr>
<td align="left">LOC_Os08g41730.1</td>
<td align="left">GCGG</td>
<td align="center">208</td>
<td align="left">LOC_Os09g32800.1</td>
<td align="center">2</td>
<td align="left">GCGG</td>
<td align="center">208</td>
</tr>
<tr>
<td align="left">LOC_Os12g08090.1</td>
<td align="left">TGCG</td>
<td align="center">115</td>
<td align="left">LOC_Os11g09020.1</td>
<td align="center">2</td>
<td align="left">TGCT</td>
<td align="center">163</td>
</tr>
<tr>
<td align="left">LOC_Os01g09540.1</td>
<td align="left">TCGG</td>
<td align="center">225</td>
<td align="left">LOC_Os05g10210.1</td>
<td align="center">2</td>
<td align="left">ATGG</td>
<td align="center">238</td>
</tr>
<tr>
<td align="left">LOC_Os05g10210.1</td>
<td align="left">TCCA</td>
<td align="center">175</td>
<td align="left">LOC_Os01g09540.1</td>
<td align="center">2</td>
<td align="left">TAAG</td>
<td align="center">248</td>
</tr>
<tr>
<td align="left">LOC_Os03g21820.1</td>
<td align="left">GCCG</td>
<td align="center">195</td>
<td align="left">LOC_Os05g39990.1</td>
<td align="center">2</td>
<td align="left">GCAG</td>
<td align="center">252</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>*Locus name of the rice gene model with intron loss.
<sup></sup>
The exonic 4-mer at the donor splice site of the lost intron was inferred from the pair-wise alignment of the coding sequences as illustrated in Figure 5.
<sup></sup>
Each 4-mer is associated with an intron phase-dependent rank ranging from 1 to 256 based on the frequency of occurrence calculated from exonic 4-mers at the exon-intron boundary of all 33,011 FLS introns.
<sup>§</sup>
The corresponding rice duplicated gene with retained intron.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption>
<p>4-mer usage of exonic sequence at acceptor splice site of lost and corresponding retained introns</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td align="center" colspan="3">Intron lost</td>
<td align="center" colspan="4">Intron retained</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Locus name*</td>
<td align="left">4-mer
<sup></sup>
</td>
<td align="center">Rank
<sup></sup>
</td>
<td align="left">Locus name
<sup>§</sup>
</td>
<td align="center">Phase</td>
<td align="left">4-mer</td>
<td align="center">Rank</td>
</tr>
<tr>
<td colspan="7">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">LOC_Os05g48520.1</td>
<td align="left">ACCG</td>
<td align="center">53</td>
<td align="left">LOC_Os01g48540.1</td>
<td align="center">0</td>
<td align="left">ATCG</td>
<td align="center">186</td>
</tr>
<tr>
<td align="left">LOC_Os06g44300.1</td>
<td align="left">TACA</td>
<td align="center">136</td>
<td align="left">LOC_Os02g08230.1</td>
<td align="center">0</td>
<td align="left">TACA</td>
<td align="center">136</td>
</tr>
<tr>
<td align="left">LOC_Os06g11920.1</td>
<td align="left">GGCT</td>
<td align="center">183</td>
<td align="left">LOC_Os02g51600.1</td>
<td align="center">0</td>
<td align="left">GGTT</td>
<td align="center">222</td>
</tr>
<tr>
<td align="left">LOC_Os06g10850.1</td>
<td align="left">GCCA</td>
<td align="center">206</td>
<td align="left">LOC_Os02g52830.1</td>
<td align="center">0</td>
<td align="left">GTGA</td>
<td align="center">255</td>
</tr>
<tr>
<td align="left">LOC_Os07g02440.1</td>
<td align="left">GGCT</td>
<td align="center">183</td>
<td align="left">LOC_Os03g55420.1</td>
<td align="center">0</td>
<td align="left">GGAT</td>
<td align="center">201</td>
</tr>
<tr>
<td align="left">LOC_Os07g12340.1</td>
<td align="left">CTGG</td>
<td align="center">176</td>
<td align="left">LOC_Os03g60080.1</td>
<td align="center">0</td>
<td align="left">TTGG</td>
<td align="center">169</td>
</tr>
<tr>
<td align="left">LOC_Os01g13130.1</td>
<td align="left">GCCA</td>
<td align="center">206</td>
<td align="left">LOC_Os05g14240.1</td>
<td align="center">0</td>
<td align="left">GCGA</td>
<td align="center">178</td>
</tr>
<tr>
<td align="left">LOC_Os11g01820.1</td>
<td align="left">GTCG</td>
<td align="center">204</td>
<td align="left">LOC_Os05g39600.1</td>
<td align="center">0</td>
<td align="left">GGCG</td>
<td align="center">152</td>
</tr>
<tr>
<td align="left">LOC_Os12g02840.1</td>
<td align="left">GCCG</td>
<td align="center">143</td>
<td align="left">LOC_Os05g40650.1</td>
<td align="center">0</td>
<td align="left">GCTG</td>
<td align="center">251</td>
</tr>
<tr>
<td align="left">LOC_Os02g14430.1</td>
<td align="left">GGCT</td>
<td align="center">183</td>
<td align="left">LOC_Os06g35480.1</td>
<td align="center">0</td>
<td align="left">GGGT</td>
<td align="center">178</td>
</tr>
<tr>
<td align="left">LOC_Os09g39720.1</td>
<td align="left">ATAC</td>
<td align="center">194</td>
<td align="left">LOC_Os08g44590.1</td>
<td align="center">0</td>
<td align="left">ATAT</td>
<td align="center">215</td>
</tr>
<tr>
<td align="left">LOC_Os02g54640.1</td>
<td align="left">GTGT</td>
<td align="center">243</td>
<td align="left">LOC_Os09g26160.1</td>
<td align="center">0</td>
<td align="left">GCAT</td>
<td align="center">223</td>
</tr>
<tr>
<td align="left">LOC_Os08g39370.1</td>
<td align="left">GTGC</td>
<td align="center">246</td>
<td align="left">LOC_Os09g31130.1</td>
<td align="center">0</td>
<td align="left">ATCA</td>
<td align="center">230</td>
</tr>
<tr>
<td align="left">LOC_Os08g41880.1</td>
<td align="left">ATGA</td>
<td align="center">214</td>
<td align="left">LOC_Os09g32840.1</td>
<td align="center">0</td>
<td align="left">ATGA</td>
<td align="center">214</td>
</tr>
<tr>
<td align="left">LOC_Os03g01820.1</td>
<td align="left">GCGG</td>
<td align="center">173</td>
<td align="left">LOC_Os10g39810.1</td>
<td align="center">0</td>
<td align="left">ATGG</td>
<td align="center">232</td>
</tr>
<tr>
<td align="left">LOC_Os05g38420.1</td>
<td align="left">GCGA</td>
<td align="center">205</td>
<td align="left">LOC_Os01g62490.1</td>
<td align="center">1</td>
<td align="left">GCGA</td>
<td align="center">205</td>
</tr>
<tr>
<td align="left">LOC_Os06g12960.1</td>
<td align="left">AGGT</td>
<td align="center">156</td>
<td align="left">LOC_Os02g50810.1</td>
<td align="center">1</td>
<td align="left">AGGT</td>
<td align="center">156</td>
</tr>
<tr>
<td align="left">LOC_Os09g26160.1</td>
<td align="left">GGCA</td>
<td align="center">229</td>
<td align="left">LOC_Os02g54640.1</td>
<td align="center">1</td>
<td align="left">AGGA</td>
<td align="center">226</td>
</tr>
<tr>
<td align="left">LOC_Os06g51050.1</td>
<td align="left">GCGG</td>
<td align="center">156</td>
<td align="left">LOC_Os03g04060.1</td>
<td align="center">1</td>
<td align="left">GTGG</td>
<td align="center">255</td>
</tr>
<tr>
<td align="left">LOC_Os02g46780.1</td>
<td align="left">GATT</td>
<td align="center">244</td>
<td align="left">LOC_Os04g50770.1</td>
<td align="center">1</td>
<td align="left">GTTT</td>
<td align="center">251</td>
</tr>
<tr>
<td align="left">LOC_Os01g50760.1</td>
<td align="left">GAAA</td>
<td align="center">246</td>
<td align="left">LOC_Os05g46580.1</td>
<td align="center">1</td>
<td align="left">GGAA</td>
<td align="center">247</td>
</tr>
<tr>
<td align="left">LOC_Os11g09020.1</td>
<td align="left">CCAA</td>
<td align="center">156</td>
<td align="left">LOC_Os12g08090.1</td>
<td align="center">1</td>
<td align="left">CCAA</td>
<td align="center">156</td>
</tr>
<tr>
<td align="left">LOC_Os05g04690.1</td>
<td align="left">GAAC</td>
<td align="center">235</td>
<td align="left">LOC_Os01g18400.1</td>
<td align="center">2</td>
<td align="left">GAAC</td>
<td align="center">235</td>
</tr>
<tr>
<td align="left">LOC_Os05g48700.1</td>
<td align="left">GGCG</td>
<td align="center">189</td>
<td align="left">LOC_Os01g55240.1</td>
<td align="center">2</td>
<td align="left">GGCC</td>
<td align="center">158</td>
</tr>
<tr>
<td align="left">LOC_Os05g39720.1</td>
<td align="left">GAGG</td>
<td align="center">218</td>
<td align="left">LOC_Os01g61080.1</td>
<td align="center">2</td>
<td align="left">GAGG</td>
<td align="center">218</td>
</tr>
<tr>
<td align="left">LOC_Os07g49280.1</td>
<td align="left">CTTC</td>
<td align="center">163</td>
<td align="left">LOC_Os03g18140.1</td>
<td align="center">2</td>
<td align="left">GTTC</td>
<td align="center">251</td>
</tr>
<tr>
<td align="left">LOC_Os07g49150.1</td>
<td align="left">GTAC</td>
<td align="center">255</td>
<td align="left">LOC_Os03g18690.1</td>
<td align="center">2</td>
<td align="left">GTAT</td>
<td align="center">256</td>
</tr>
<tr>
<td align="left">LOC_Os07g49000.1</td>
<td align="left">GTAC</td>
<td align="center">255</td>
<td align="left">LOC_Os03g19200.1</td>
<td align="center">2</td>
<td align="left">GTAC</td>
<td align="center">255</td>
</tr>
<tr>
<td align="left">LOC_Os09g26360.1</td>
<td align="left">GTAC</td>
<td align="center">255</td>
<td align="left">LOC_Os08g34910.1</td>
<td align="center">2</td>
<td align="left">GTAC</td>
<td align="center">255</td>
</tr>
<tr>
<td align="left">LOC_Os08g41730.1</td>
<td align="left">CACG</td>
<td align="center">97</td>
<td align="left">LOC_Os09g32800.1</td>
<td align="center">2</td>
<td align="left">GACG</td>
<td align="center">158</td>
</tr>
<tr>
<td align="left">LOC_Os12g08090.1</td>
<td align="left">CGCC</td>
<td align="center">18</td>
<td align="left">LOC_Os11g09020.1</td>
<td align="center">2</td>
<td align="left">GGCG</td>
<td align="center">189</td>
</tr>
<tr>
<td align="left">LOC_Os01g09540.1</td>
<td align="left">GTAC</td>
<td align="center">255</td>
<td align="left">LOC_Os05g10210.1</td>
<td align="center">2</td>
<td align="left">AACT</td>
<td align="center">134</td>
</tr>
<tr>
<td align="left">LOC_Os05g10210.1</td>
<td align="left">GCCT</td>
<td align="center">182</td>
<td align="left">LOC_Os01g09540.1</td>
<td align="center">2</td>
<td align="left">GTCG</td>
<td align="center">194</td>
</tr>
<tr>
<td align="left">LOC_Os03g21820.1</td>
<td align="left">CGTG</td>
<td align="center">153</td>
<td align="left">LOC_Os05g39990.1</td>
<td align="center">2</td>
<td align="left">GGTG</td>
<td align="center">233</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>*Locus name of the rice gene model with intron loss.
<sup></sup>
The exonic 4-mer at the acceptor splice site of the lost intron was inferred from the pair-wise alignment of the coding sequences as illustrated in Figure 5.
<sup></sup>
Each 4-mer is associated with an intron phase-dependent rank ranging from 1 to 256 as its based on the frequency of occurrence calculated from exonic 4-mers at the exon-intron boundary of all 33,011 FLS introns.
<sup>§</sup>
The corresponding rice duplicated gene with retained intron.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption>
<p>Sum of the ranks of the exonic 4-mers at the donor and acceptor splice site of lost introns and simulated introns</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td></td>
<td align="center" colspan="2">Sum of the ranks</td>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td align="center">Donor site</td>
<td align="center">Acceptor site</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Lost introns*</td>
<td align="center">6,737</td>
<td align="center">6,410</td>
</tr>
<tr>
<td align="left">Simulation average
<sup></sup>
(std)</td>
<td align="center">7,647 (253)</td>
<td align="center">6,679 (337)</td>
</tr>
<tr>
<td align="left">
<italic>P</italic>
value of lost introns
<sup></sup>
</td>
<td align="center">0.0007</td>
<td align="center">>0.05</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>*Sum of the ranks of the exonic 4-mers at the donor and acceptor splice site of the 34 lost introns.
<sup></sup>
A total of 10,000 iterations were generated. In each iteration, a total of 34 ranks were randomly generated according to the frequencies obtained from all the exonic 4-mers at the exon-intron boundaries of 33,011 FLS introns. Standard deviation is listed in the parenthesis.
<sup></sup>
The
<italic>P</italic>
value for the sums of the ranks of the donor and acceptor splice site.</p>
</table-wrap-foot>
</table-wrap>
</sec>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000559  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000559  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021